Abstract:Adaptive lasso penalized generalized linear models (GLMs) are a powerful tool for analyzing the high-dimensional sparse data where the classical linear or normal assumption is not met. In non-distributed environments, the estimation problem of adaptive lasso penalized GLMs is often solved by the coordinate descent based algorithm developed in Friedman, Hastie, and Tibshirani (<a href="#">2010</a> Friedman, J., T. Hastie, and R. Tibshirani. 2010. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software 33 (1):1–22. doi:10.18637/jss.v033.i01.<a href="/servlet/linkout?suffix=CIT0008&dbid=16&doi=10.1080%2F03610918.2021.1888998&key=10.18637%2Fjss.v033.i01">[Crossref]</a>, <a href="/servlet/linkout?suffix=CIT0008&dbid=8&doi=10.1080%2F03610918.2021.1888998&key=20808728">[PubMed]</a>, <a href="/servlet/linkout?suffix=CIT0008&dbid=128&doi=10.1080%2F03610918.2021.1888998&key=000275203200001">[Web of Science ®]</a> , <a class="google-scholar" href="http://scholar.google.com/scholar_lookup?hl=en&volume=33&publication_year=2010&pages=1-22&issue=1&author=J.+Friedman&author=T.+Hastie&author=R.+Tibshirani&title=Regularization+Paths+for+Generalized+Linear+Models+via+Coordinate+Descent&doi=10.18637%2Fjss.v033.i01">[Google Scholar]</a>), which has been well implemented in the R package glmnet. However, when applied to distributed big data, this algorithm is usually inflexible or even infeasible due to its non-parallel implementation, especially when the communication costs between the central and local machines are expensive, or the storage and computing capabilities of the central machine are insufficient. In this paper, we propose a new method, QAGLM-alasso, for the adaptive lasso penalized GLMs problem in distributed big data by applying the quadratic approximation representation of GLMs, and further develop a path-following algorithm for its estimation based on the Least Angle Regression (LARS). Theoretical analyses show that, under mild regularity conditions, the QAGLM-alasso enjoys the oracle property, and the obtained estimator is asymptotically equivalent to the original adaptive lasso. Simulation studies demonstrate that the new algorithm has similar estimation accuracy with glmnet, but is significantly faster than glmnet in distributed environments. We further illustrate the practical performance of the proposed method by analyzing a supersymmetric (SUSY) benchmark data set.

Distributed adaptive lasso penalized generalized linear models for big data

Distributed Bootstrap Simultaneous Inference for High-Dimensional Quantile Regression

Unified algorithms for distributed regularized linear regression model

Adaptive debiased SGD in high-dimensional GLMs with streaming data

Distributed non-convex regularization for generalized linear regression

Distributed optimization and statistical learning for large-scale penalized expectile regression

Distributed quantile regression for massive heterogeneous data

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

Distributed Adaptive Newton Methods with Globally Superlinear Convergence

Least Squares Approximation for a Distributed System

Generalized fused Lasso for grouped data in generalized linear models

Least-Square Approximation for a Distributed System.

Gaussian Graphical Models parallel estimation via coordinate descent neighborhood selection

Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification

Distributed adaptive Newton methods with global superlinear convergence

A distributed block coordinate descent method for training $l_1$ regularized linear classifiers

Distributed Coordinate Descent for L1-regularized Logistic Regression

A generalization of regularized dual averaging and its dynamics

Distributed Linear Regression with Compositional Covariates

Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data

Efficient sparse Hessian based algorithms for the clustered lasso problem.