Cross validation for L2E sparse regression with existing penalization methods

CV_L2E_sparse_ncv performs k-fold cross-validation for robust sparse regression under the L2 criterion. Available penalties include lasso, MCP and SCAD.

CV_L2E_sparse_ncv(
  y,
  X,
  beta0,
  tau0,
  lambdaSeq,
  penalty = "MCP",
  nfolds = 5,
  seed = 1234,
  method = "median",
  max_iter = 100,
  tol = 1e-04,
  trace = TRUE
)

Arguments

y	Response vector
X	Design matrix
beta0	Initial vector of regression coefficients, can be omitted
tau0	Initial precision estimate, can be omitted
lambdaSeq	A decreasing sequence of tuning parameter lambda, can be omitted
penalty	Available penalties include lasso, MCP and SCAD.
nfolds	The number of cross-validation folds. Default is 5.
seed	Users can set the seed of the random number generator to obtain reproducible results.
method	Median or mean to compute the objective
max_iter	Maximum number of iterations
tol	Relative tolerance
trace	Whether to trace the progress of the cross-validation

Value

Returns a list object containing the mean and standard error of the cross-validation error -- CVE and CVSE -- for each value of k (vectors), the index of the lambda with the minimum CVE and the lambda value itself (scalars), the index of the lambda value with the 1SE CVE and the lambda value itself (scalars), the sequence of lambda used in the regression (vector), and a vector listing which fold each element of y was assigned to

Examples

## Completes in 20 seconds

set.seed(12345)
n <- 100
tau <- 1
f <- matrix(c(rep(2,5), rep(0,45)), ncol = 1)
X <- X0 <- matrix(rnorm(n*50), nrow = n)
y <- y0 <- X0 %*% f + (1/tau)*rnorm(n)

## Clean Data
lambda <- 10^seq(-1, -2, length.out=20)
cv <- CV_L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda, penalty="SCAD", seed=1234, nfolds=2)
#> Starting CV fold #1
#> Starting CV fold #2
(lambda_min <- cv$lambda.min)
#> [1] 0.1

sol <- L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda_min, penalty="SCAD")
#>    user  system elapsed 
#>   0.015   0.000   0.015 
r <- y - X %*% sol$Beta
ix <- which(abs(r) > 3/sol$Tau)
l2e_fit <- X %*% sol$Beta

plot(y, l2e_fit, ylab='Predicted values', pch=16, cex=0.8)
points(y[ix], l2e_fit[ix], pch=16, col='blue', cex=0.8)

## Contaminated Data
i <- 1:5
y[i] <- 2 + y0[i]
X[i,] <- 2 + X0[i,]

cv <- CV_L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda, penalty="SCAD", seed=1234, nfolds=2)
#> Starting CV fold #1
#> Starting CV fold #2
(lambda_min <- cv$lambda.min)
#> [1] 0.1

sol <- L2E_sparse_ncv(y=y, X=X, lambdaSeq=lambda_min, penalty="SCAD")
#>    user  system elapsed 
#>   0.058   0.000   0.058 
r <- y - X %*% sol$Beta
ix <- which(abs(r) > 3/sol$Tau)
l2e_fit <- X %*% sol$Beta

plot(y, l2e_fit, ylab='Predicted values', pch=16, cex=0.8)
points(y[ix], l2e_fit[ix], pch=16, col='blue', cex=0.8)