Abstract:High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by modern applications in high-throughput genomic data analysis and credit risk analysis. In this article, we propose a class of regularization methods for simultaneous variable selection and estimation in the additive hazards model, by combining the nonconcave penalized likelihood approach and the pseudoscore method. In a high-dimensional setting where the dimensionality can grow fast, polynomially or nonpolynomially, with the sample size, we establish the weak oracle property and oracle property under mild, interpretable conditions, thus providing strong performance guarantees for the proposed methodology. Moreover, we show that the regularity conditions required by the $L_1$ method are substantially relaxed by a certain class of sparsity-inducing concave penalties. As a result, concave penalties such as the smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), and smooth integration of counting and absolute deviation (SICA) can significantly improve on the $L_1$ method and yield sparser models with better prediction performance. We present a coordinate descent algorithm for efficient implementation and rigorously investigate its convergence properties. The practical utility and effectiveness of the proposed methods are demonstrated by simulation studies and a real data example.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively perform variable selection and estimation in high - dimensional sparse modeling, especially when dealing with censored survival data. Specifically, the paper focuses on how to achieve simultaneous variable selection and parameter estimation through regularization methods when the number of features $p$ can grow rapidly (polynomially or non - polynomially) to the sample size $n$ in a high - dimensional setting. The paper proposes a class of regularization methods that combine non - concave penalized likelihood methods and pseudo - score methods for variable selection and estimation in the additive risk model. ### Main Problems 1. **High - Dimensional Sparse Modeling**: How to effectively select important variables and perform parameter estimation when the number of features is much larger than the number of samples. 2. **Censored Survival Data**: How to handle censored data in survival analysis, especially in a high - dimensional setting. 3. **Variable Selection and Estimation**: How to simultaneously achieve variable selection and parameter estimation through regularization methods to improve the predictive performance and interpretability of the model. ### Solutions The paper proposes a class of regularization methods to solve the above problems by combining non - concave penalized likelihood methods and pseudo - score methods. Specific methods include: - **Non - Concave Penalty Function**: Use non - concave penalty functions (such as SCAD, MCP, SICA, etc.) to relax the regularity condition requirements of the L1 penalty method, thereby obtaining a sparser model and better predictive performance. - **Pseudo - Score Method**: Use the pseudo - score function to construct the loss function, thereby performing effective variable selection and parameter estimation in a high - dimensional setting. - **Theoretical Guarantee**: Establish the weak Oracle property and the Oracle property to provide strong performance guarantees for the proposed method. ### Theoretical Contributions - **Weak Oracle Property**: Under mild conditions, it is proved that the proposed regularization estimator has the weak Oracle property, that is, it can perform model selection and estimation consistently. - **Oracle Property**: It is further proved that under some additional eigenvalue conditions, the proposed regularization estimator has the Oracle property, that is, it is asymptotically as effective as the Oracle estimator of the known true sparse model. ### Practical Applications - **Algorithm Implementation**: Propose a coordinate descent algorithm for efficiently implementing the proposed method and strictly study its convergence properties. - **Simulation and Empirical Studies**: Verify the effectiveness and practicality of the proposed method through simulation studies and actual data analysis. In summary, this paper aims to solve the problems of variable selection and estimation in high - dimensional sparse modeling by proposing a new class of regularization methods, especially when dealing with censored survival data. Through theoretical analysis and practical verification, the effectiveness and superiority of the proposed method are demonstrated.

High-Dimensional Sparse Additive Hazards Regression

Expectile regression for analyzing heteroscedasticity in high dimension

Robust Estimation and Shrinkage in Ultrahigh Dimensional Expectile Regression with Heavy Tails and Variance Heterogeneity

Semiparametric Expectile Regression for High-dimensional Heavy-tailed and Heterogeneous Data

L0-Regularized Learning for High-Dimensional Additive Hazards Regression

ORACLE INEQUALITIES AND SELECTION CONSISTENCY FOR WEIGHTED LASSO IN HIGH-DIMENSIONAL ADDITIVE HAZARDS MODEL

Joint Feature Screening for Ultra-High-dimensional Sparse Additive Hazards Model by the Sparsity-Restricted Pseudo-Score Estimator

BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES

Regression analysis of multiplicative hazards model with time-dependent coefficient for sparse longitudinal covariates

Penalized Sparse Covariance Regression with High Dimensional Covariates

Estimating Treatment Effect under Additive Hazards Models with High-dimensional Covariates

Additive Hazards Regression Models for Survival Data

High-dimensional robust inference for censored linear models

Semiparametric estimation for the functional additive hazards model

Estimation of the Additive Hazards Model with Interval‐censored Data and Missing Covariates

Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data

SICA for Cox’s Proportional Hazards Model with a Diverging Number of Parameters

Additive Hazards Regression for Case-Cohort Studies with Interval-Censored Data

Semiparametric Analysis of the Additive Risk Model

Fine-Gray competing risks model with high-dimensional covariates: estimation and Inference

ON ESTIMATION OF THE OPTIMAL TREATMENT REGIME WITH THE ADDITIVE HAZARDS MODEL.