Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery

Caixing Wang,Ziliang Shen
2024-06-01
Abstract:In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.
Machine Learning,Methodology
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **Estimation Efficiency and Support Recovery in Distributed High-Dimensional Linear Quantile Regression**: This paper focuses on estimation and variable selection (support recovery) in high-dimensional linear quantile regression within a distributed environment. The non-smooth nature of the check loss function in quantile regression poses challenges for computation and theoretical analysis in a distributed setting. 2. **Improved Methods under Non-Independence Assumptions**: Traditional methods often assume that the error terms are independent of the covariates, an assumption that is difficult to verify in practice. This paper proposes a new distributed high-dimensional sparse quantile regression method (DHSQR) that does not require the strict independence assumption of error terms. 3. **Theoretical Guarantees**: By introducing a double smoothing transformation, the proposed method not only achieves nearly optimal convergence rates but also attains the same support recovery accuracy as in a single-machine environment after a constant number of iterations. 4. **Computational and Communication Efficiency**: The proposed algorithm demonstrates high computational and communication efficiency in distributed systems, requiring only the transmission of low-dimensional gradient vectors instead of high-dimensional Hessian matrices. In summary, this paper aims to develop a new distributed high-dimensional linear quantile regression method that can efficiently address estimation and variable selection problems in large-scale datasets in practical applications, with solid theoretical guarantees.