Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery

Caixing Wang,Ziliang Shen

2024-06-01

Abstract:In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.

Machine Learning,Methodology

What problem does this paper attempt to address?

The paper attempts to address the following issues: 1. **Estimation Efficiency and Support Recovery in Distributed High-Dimensional Linear Quantile Regression**: This paper focuses on estimation and variable selection (support recovery) in high-dimensional linear quantile regression within a distributed environment. The non-smooth nature of the check loss function in quantile regression poses challenges for computation and theoretical analysis in a distributed setting. 2. **Improved Methods under Non-Independence Assumptions**: Traditional methods often assume that the error terms are independent of the covariates, an assumption that is difficult to verify in practice. This paper proposes a new distributed high-dimensional sparse quantile regression method (DHSQR) that does not require the strict independence assumption of error terms. 3. **Theoretical Guarantees**: By introducing a double smoothing transformation, the proposed method not only achieves nearly optimal convergence rates but also attains the same support recovery accuracy as in a single-machine environment after a constant number of iterations. 4. **Computational and Communication Efficiency**: The proposed algorithm demonstrates high computational and communication efficiency in distributed systems, requiring only the transmission of low-dimensional gradient vectors instead of high-dimensional Hessian matrices. In summary, this paper aims to develop a new distributed high-dimensional linear quantile regression method that can efficiently address estimation and variable selection problems in large-scale datasets in practical applications, with solid theoretical guarantees.

Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery

Distributed High-dimensional Regression under a Quantile Loss Function

Distributed Bootstrap Simultaneous Inference for High-Dimensional Quantile Regression

Distributed fault estimation of networked systems using quantized measurements

Distributed Quantile Regression over Sensor Networks

Communication-efficient estimation and inference for high-dimensional quantile regression based on smoothed decorrelated score

Communication-Efficient Nonparametric Quantile Regression via Random Features

Distributed Online Quantile Regression over Networks with Quantized Communication.

Distributed quantile regression for massive heterogeneous data

Sparse Estimation Via ℓ_q Optimization Method in High-Dimensional Linear Regression

Transfer Learning for High-dimensional Quantile Regression with Distribution Shift

A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating

Online Quantile Regression

Penalized weighted smoothed quantile regression for high‐dimensional longitudinal data

A LQD-RKHS-based distribution-to-distribution regression method and its application to restore distributions of missing SHM data

Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

Optimal subsampling algorithm for composite quantile regression with distributed data

Regression via Arbitrary Quantile Modeling

Estimation and inference for transfer learning with high-dimensional quantile regression

Robust reduced rank regression in a distributed setting

LQD-RKHS-based distribution-to-distribution regression methodology for restoring the probability distributions of missing SHM data