High-Dimensional Sparse Additive Hazards Regression

Wei Lin,Jinchi Lv
DOI: https://doi.org/10.1080/01621459.2012.746068
2012-12-27
Abstract:High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by modern applications in high-throughput genomic data analysis and credit risk analysis. In this article, we propose a class of regularization methods for simultaneous variable selection and estimation in the additive hazards model, by combining the nonconcave penalized likelihood approach and the pseudoscore method. In a high-dimensional setting where the dimensionality can grow fast, polynomially or nonpolynomially, with the sample size, we establish the weak oracle property and oracle property under mild, interpretable conditions, thus providing strong performance guarantees for the proposed methodology. Moreover, we show that the regularity conditions required by the $L_1$ method are substantially relaxed by a certain class of sparsity-inducing concave penalties. As a result, concave penalties such as the smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), and smooth integration of counting and absolute deviation (SICA) can significantly improve on the $L_1$ method and yield sparser models with better prediction performance. We present a coordinate descent algorithm for efficient implementation and rigorously investigate its convergence properties. The practical utility and effectiveness of the proposed methods are demonstrated by simulation studies and a real data example.
Methodology,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively perform variable selection and estimation in high - dimensional sparse modeling, especially when dealing with censored survival data. Specifically, the paper focuses on how to achieve simultaneous variable selection and parameter estimation through regularization methods when the number of features \(p\) can grow rapidly (polynomially or non - polynomially) to the sample size \(n\) in a high - dimensional setting. The paper proposes a class of regularization methods that combine non - concave penalized likelihood methods and pseudo - score methods for variable selection and estimation in the additive risk model. ### Main Problems 1. **High - Dimensional Sparse Modeling**: How to effectively select important variables and perform parameter estimation when the number of features is much larger than the number of samples. 2. **Censored Survival Data**: How to handle censored data in survival analysis, especially in a high - dimensional setting. 3. **Variable Selection and Estimation**: How to simultaneously achieve variable selection and parameter estimation through regularization methods to improve the predictive performance and interpretability of the model. ### Solutions The paper proposes a class of regularization methods to solve the above problems by combining non - concave penalized likelihood methods and pseudo - score methods. Specific methods include: - **Non - Concave Penalty Function**: Use non - concave penalty functions (such as SCAD, MCP, SICA, etc.) to relax the regularity condition requirements of the L1 penalty method, thereby obtaining a sparser model and better predictive performance. - **Pseudo - Score Method**: Use the pseudo - score function to construct the loss function, thereby performing effective variable selection and parameter estimation in a high - dimensional setting. - **Theoretical Guarantee**: Establish the weak Oracle property and the Oracle property to provide strong performance guarantees for the proposed method. ### Theoretical Contributions - **Weak Oracle Property**: Under mild conditions, it is proved that the proposed regularization estimator has the weak Oracle property, that is, it can perform model selection and estimation consistently. - **Oracle Property**: It is further proved that under some additional eigenvalue conditions, the proposed regularization estimator has the Oracle property, that is, it is asymptotically as effective as the Oracle estimator of the known true sparse model. ### Practical Applications - **Algorithm Implementation**: Propose a coordinate descent algorithm for efficiently implementing the proposed method and strictly study its convergence properties. - **Simulation and Empirical Studies**: Verify the effectiveness and practicality of the proposed method through simulation studies and actual data analysis. In summary, this paper aims to solve the problems of variable selection and estimation in high - dimensional sparse modeling by proposing a new class of regularization methods, especially when dealing with censored survival data. Through theoretical analysis and practical verification, the effectiveness and superiority of the proposed method are demonstrated.