Abstract:Adaptive lasso penalized generalized linear models (GLMs) are a powerful tool for analyzing the high-dimensional sparse data where the classical linear or normal assumption is not met. In non-distributed environments, the estimation problem of adaptive lasso penalized GLMs is often solved by the coordinate descent based algorithm developed in Friedman, Hastie, and Tibshirani (<a href="#">2010</a> Friedman, J., T. Hastie, and R. Tibshirani. 2010. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software 33 (1):1–22. doi:10.18637/jss.v033.i01.<a href="/servlet/linkout?suffix=CIT0008&dbid=16&doi=10.1080%2F03610918.2021.1888998&key=10.18637%2Fjss.v033.i01">[Crossref]</a>, <a href="/servlet/linkout?suffix=CIT0008&dbid=8&doi=10.1080%2F03610918.2021.1888998&key=20808728">[PubMed]</a>, <a href="/servlet/linkout?suffix=CIT0008&dbid=128&doi=10.1080%2F03610918.2021.1888998&key=000275203200001">[Web of Science ®]</a> , <a class="google-scholar" href="http://scholar.google.com/scholar_lookup?hl=en&volume=33&publication_year=2010&pages=1-22&issue=1&author=J.+Friedman&author=T.+Hastie&author=R.+Tibshirani&title=Regularization+Paths+for+Generalized+Linear+Models+via+Coordinate+Descent&doi=10.18637%2Fjss.v033.i01">[Google Scholar]</a>), which has been well implemented in the R package glmnet. However, when applied to distributed big data, this algorithm is usually inflexible or even infeasible due to its non-parallel implementation, especially when the communication costs between the central and local machines are expensive, or the storage and computing capabilities of the central machine are insufficient. In this paper, we propose a new method, QAGLM-alasso, for the adaptive lasso penalized GLMs problem in distributed big data by applying the quadratic approximation representation of GLMs, and further develop a path-following algorithm for its estimation based on the Least Angle Regression (LARS). Theoretical analyses show that, under mild regularity conditions, the QAGLM-alasso enjoys the oracle property, and the obtained estimator is asymptotically equivalent to the original adaptive lasso. Simulation studies demonstrate that the new algorithm has similar estimation accuracy with glmnet, but is significantly faster than glmnet in distributed environments. We further illustrate the practical performance of the proposed method by analyzing a supersymmetric (SUSY) benchmark data set.

Distributed optimization and statistical learning for large-scale penalized expectile regression

Expectile regression for analyzing heteroscedasticity in high dimension

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

Variable Selection in Expectile Regression

Robust Estimation and Shrinkage in Ultrahigh Dimensional Expectile Regression with Heavy Tails and Variance Heterogeneity

Distributed Bootstrap Simultaneous Inference for High-Dimensional Quantile Regression

Unified algorithms for distributed regularized linear regression model

Estimation and testing of expectile regression with efficient subsampling for massive data

Inference for High-Dimensional Linear Expectile Regression with De-Biasing Method

Distributed Linear Regression with Compositional Covariates

Distributed adaptive lasso penalized generalized linear models for big data

Distributed Estimation and Inference for Semi-parametric Binary Response Models

Distributed quantile regression for massive heterogeneous data

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

Renewable estimation in expectile regression model with streaming data sets

Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

Relative error-based distributed estimation in growing dimensions

An Asynchronous Distributed Expectation Maximization Algorithm for Massive Data: The DEM Algorithm

Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties

Distributed Semi-Supervised Sparse Statistical Inference

Penalized Sparse Covariance Regression with High Dimensional Covariates