CV_L2E_sparse_ncv performs k-fold cross-validation for robust sparse regression under the L2 criterion. Available penalties include lasso, MCP and SCAD.

CV_L2E_sparse_ncv(
  y,
  X,
  beta0,
  tau0,
  lambdaSeq,
  penalty = "MCP",
  nfolds = 5,
  seed = 1234,
  method = "median",
  max_iter = 100,
  tol = 1e-04,
  trace = TRUE
)

Arguments

y

Response vector

X

Design matrix

beta0

Initial vector of regression coefficients, can be omitted

tau0

Initial precision estimate, can be omitted

lambdaSeq

A decreasing sequence of tuning parameter lambda, can be omitted

penalty

Available penalties include lasso, MCP and SCAD.

nfolds

The number of cross-validation folds. Default is 5.

seed

Users can set the seed of the random number generator to obtain reproducible results.

method

Median or mean to compute the objective

max_iter

Maximum number of iterations

tol

Relative tolerance

trace

Whether to trace the progress of the cross-validation

Value

Returns a list object containing the mean and standard error of the cross-validation error -- CVE and CVSE -- for each value of k (vectors), the index of the lambda with the minimum CVE and the lambda value itself (scalars), the index of the lambda value with the 1SE CVE and the lambda value itself (scalars), the sequence of lambda used in the regression (vector), and a vector listing which fold each element of y was assigned to

Examples

## Completes in 20 seconds set.seed(12345) n <- 100 tau <- 1 f <- matrix(c(rep(2,5), rep(0,45)), ncol = 1) X <- X0 <- matrix(rnorm(n*50), nrow = n) y <- y0 <- X0 %*% f + (1/tau)*rnorm(n) ## Clean Data lambda <- 10^seq(-1, -2, length.out=20) cv <- CV_L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda, penalty="SCAD", seed=1234, nfolds=2)
#> Starting CV fold #1 #> Starting CV fold #2
(lambda_min <- cv$lambda.min)
#> [1] 0.1
sol <- L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda_min, penalty="SCAD")
#> user system elapsed #> 0.015 0.000 0.015
r <- y - X %*% sol$Beta ix <- which(abs(r) > 3/sol$Tau) l2e_fit <- X %*% sol$Beta plot(y, l2e_fit, ylab='Predicted values', pch=16, cex=0.8)
points(y[ix], l2e_fit[ix], pch=16, col='blue', cex=0.8)
## Contaminated Data i <- 1:5 y[i] <- 2 + y0[i] X[i,] <- 2 + X0[i,] cv <- CV_L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda, penalty="SCAD", seed=1234, nfolds=2)
#> Starting CV fold #1 #> Starting CV fold #2
(lambda_min <- cv$lambda.min)
#> [1] 0.1
sol <- L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda_min, penalty="SCAD")
#> user system elapsed #> 0.058 0.000 0.058
r <- y - X %*% sol$Beta ix <- which(abs(r) > 3/sol$Tau) l2e_fit <- X %*% sol$Beta plot(y, l2e_fit, ylab='Predicted values', pch=16, cex=0.8)
points(y[ix], l2e_fit[ix], pch=16, col='blue', cex=0.8)