Double Sparsity Garrotized Kernel Machine in High-Dimensional Partially Linear Model
Xinyi Zhao,Yaohua Rong,Junze Lin,Maozai Tian,Jinwen Liang
DOI: https://doi.org/10.1080/03610918.2024.2329244
2024-01-01
Communications in Statistics - Simulation and Computation
Abstract:Obtaining excellent prediction accuracy in the high-dimensional partially linear model is particularly important. However, it is difficult to achieve due to the complex relationship between nonparametric covariates and the response. Irrelevant covariates and unimportant data also commonly attenuate the prediction performance of the model. Further, high-dimensional data analysis is challenging for modern statistical studies. To overcome these difficulties, we propose the Double Sparsity Garrotized Kernel Machine (DSGKM) method with an efficient algorithm and its adjusted version for prediction. Specifically, we estimate the nonparametric components using the kernel machine technique, and impose L1-norm penalties simultaneously to select relevant covariates and retain the representative data in the final model. Besides, the convergence analysis of the adjusted algorithm is conducted. The advantages of our method are: (i) to sufficiently capture the complex relationship between nonparametric covariates and the response; (ii) to identify relevant covariates and select representative data; and (iii) to achieve higher computational efficiency, especially the situations when both parametric and nonparametric components are high-dimensional. Results on both simulated and real data show that the proposed method outperforms existing methods, even when outliers exist.