Distributed optimization and statistical learning for large-scale penalized expectile regression

Yingli Pan
DOI: https://doi.org/10.1007/s42952-020-00074-5
2020-06-09
Journal of the Korean Statistical Society
Abstract:Large-scale data from various research fields are not only heterogeneous and sparse but also difficult to store on a single machine. Expectile regression is a popular alternative for modeling heterogeneous data. In this paper, we devise a distributed optimization approach to SCAD and adaptive LASSO penalized expectile regression, where the observations are randomly partitioned across multiple machines. We construct a penalized communication-efficient surrogate loss (CSL) function. Computationally, our method based on the CSL function requires only the master machine to solve a regular M-estimation problem, while other worker machines compute the gradient of the loss function on local data. Our method matches the estimation error bound of the centralized method during consecutive rounds of communication. Under some mild assumptions, we establish the oracle properties of the SCAD and adaptive LASSO penalized expectile regression. We then develop a modified alternating direction method of multipliers (ADMM) algorithm for the implementation of the proposed estimator. A series of simulation studies are conducted to assess the finite-sample performance of the proposed estimator. Applications to an HIV study demonstrate the practicability of the proposed method.
statistics & probability
What problem does this paper attempt to address?