Transfer Learning for High-dimensional Quantile Regression with Distribution Shift

Ruiqi Bai,Yijiao Zhang,Hanbo Yang,Zhongyi Zhu
2024-11-30
Abstract:Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift: parameter shift, covariate shift, and residual shift. We propose a novel transferable set and a new transfer framework to address the above three discrepancies. Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method in the presence of distribution shift. Additionally, an orthogonal debiased approach is proposed for statistical inference with knowledge transfer, leading to sharper asymptotic results. Extensive simulation results as well as real data applications further demonstrate the effectiveness of our proposed procedure.
Methodology,Statistics Theory,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to conduct effective transfer learning in high - dimensional quantile regression in the presence of distribution shift. Specifically, the paper focuses on three types of distribution shifts: parameter shift, covariate shift, and residual shift. These shifts may lead to a decrease in the efficiency of knowledge transfer from source data to target data or even negative transfer. ### Background and Challenges of the Problem 1. **Parameter Shift**: When the model parameters in the target domain and the source domain are different (i.e., \(\beta^* \neq w^{(k)}\)), directly transferring the information of the source domain may lead to bias. 2. **Covariate Shift**: When the covariate distributions in the target domain and the source domain are different (i.e., \(P(x^{(0)}) \neq P(x^{(k)})\)), directly combining the data may lead to inaccurate estimation results. 3. **Residual Shift**: When the conditional distributions of the model residuals in the target domain and the source domain are different (i.e., \(P(\epsilon^{(0)}|x^{(0)}) \neq P(\epsilon^{(k)}|x^{(k)})\)), direct transfer may lead to increased noise and affect the estimation effect. ### Core Contributions of the Paper To solve the above problems, the paper makes the following innovations: 1. **New Transferable Set**: - A new transferable set \(C_h\) is defined. This set not only considers the parameter shift \(\|\delta^{(k)}\|_1 \leq h_1\), but also introduces a restriction on the density value at the \(\tau\) - quantile \(E[f^{(0)}(0|x^{(0)})] \leq h_2 E[f^{(k)}(0|x^{(k)})]\). This helps to deal with parameter and residual shifts simultaneously. 2. **Complete Transfer Learning Framework**: - A method for jointly estimating the target parameter \(\beta^*\) and the contrast vectors \(\{\delta^{(k)}\}\) is proposed, which is implemented by a constrained optimization algorithm. This method can maintain robustness in the presence of covariate shift and improve the estimation accuracy by controlling parameter and residual shifts. 3. **Theoretical Guarantee**: - The non - asymptotic \(\ell_1/\ell_2\) error bounds are derived, proving the superiority of the proposed method under various distribution shifts. In addition, the source detection consistency is established to ensure the effectiveness of the method. 4. **Statistical Inference**: - A de - biasing method based on Neyman orthogonality is proposed for the statistical inference of quantization coefficients. This method avoids the estimation of high - dimensional precision matrices and allows non - homogeneous data distributions, thus alleviating the problems caused by covariate and residual shifts. ### Summary By introducing a new transferable set and a transfer learning framework, the paper solves the problem of knowledge transfer in high - dimensional quantile regression in the presence of multiple distribution shifts. Through theoretical analysis and experimental verification, the effectiveness and superiority of this method in dealing with complex data distribution changes are demonstrated.