Abstract:Survival analysis is an important research topic with applications in healthcare, business, and manufacturing. One essential tool in this area is the Cox proportional hazards (CPH) model, which is widely used for its interpretability, flexibility, and predictive performance. However, for modern data science challenges such as high dimensionality (both $n$ and $p$) and high feature correlations, current algorithms to train the CPH model have drawbacks, preventing us from using the CPH model at its full potential. The root cause is that the current algorithms, based on the Newton method, have trouble converging due to vanishing second order derivatives when outside the local region of the minimizer. To circumvent this problem, we propose new optimization methods by constructing and minimizing surrogate functions that exploit hidden mathematical structures of the CPH model. Our new methods are easy to implement and ensure monotonic loss decrease and global convergence. Empirically, we verify the computational efficiency of our methods. As a direct application, we show how our optimization methods can be used to solve the cardinality-constrained CPH problem, producing very sparse high-quality models that were not previously practical to construct. We list several extensions that our breakthrough enables, including optimization opportunities, theoretical questions on CPH's mathematical structure, as well as other CPH-related applications.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the training problems of the Cox Proportional Hazards (CPH) model in modern data - science challenges. Specifically, it makes improvements in the following aspects: 1. **High - dimensional data and highly correlated features**: With the increase in sample size and feature space and the high correlation among features, the existing CPH model training algorithms encounter difficulties in convergence and insufficient precision. These problems prevent the CPH model from fully realizing its potential. 2. **Limitations of existing optimization methods**: The current optimization algorithms based on Newton's method have high computational complexity when dealing with high - dimensional data, and due to vanishing second - order derivatives, these algorithms are difficult to converge when far from the local minimum area, resulting in the loss function may diverge or converge very slowly. 3. **Trade - off between accuracy and efficiency**: The existing optimization methods are either computationally expensive but with high accuracy (such as the exact Newton method), or have low computational cost but slow convergence speed and low accuracy (such as the quasi - Newton method and the proximal Newton method). This trade - off limits the performance of the CPH model in practical applications. To solve the above problems, the paper proposes a new optimization method that utilizes the hidden mathematical structure in the CPH model by constructing and minimizing surrogate functions. This method not only ensures monotonic loss decrease and global convergence but also improves computational efficiency, thus being able to effectively handle the problems of high - dimensional data and highly correlated features. In addition, this method can also be used to solve CPH problems with cardinality constraints, generating very sparse high - quality models, which is difficult to achieve with previous methods. ### Specific contributions - **Discovering the defects of existing optimization algorithms**: Pointing out the slow convergence and low - precision problems of current optimization algorithms when dealing with high - dimensional data and highly correlated features. - **Proposing a new optimization algorithm**: By minimizing quadratic and cubic surrogate functions, a new optimization method is proposed, which ensures loss decrease in each iteration and global convergence. - **Efficiently calculating partial derivatives**: Discovering the hidden mathematical structure in the CPH loss function, being able to accurately calculate the first - order, second - order, and third - order partial derivatives within the time complexity of O(n), and proving that these partial derivatives are Lipschitz continuous. - **Application expansion**: Demonstrating the application of the new method in variable selection, regularization, and constraint problems, especially its superior performance in highly correlated feature scenarios. Through these improvements, the paper provides a more efficient and accurate solution for the training of the CPH model, bringing an important methodological breakthrough in the field of survival analysis.

FastSurvival: Hidden Computational Blessings in Training Cox Proportional Hazards Models

An online framework for survival analysis: reframing Cox proportional hazards model for large data sets and neural networks

A scalable and flexible Cox proportional hazards model for high-dimensional survival prediction and functional selection

BigSurvSGD: Big Survival Data Analysis via Stochastic Gradient Descent

Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

Semi-parametric Regression Model for Survival Data: Graphical Visualization with R.

Fairness in Survival Analysis with Distributionally Robust Optimization

SurvMaximin: Robust federated approach to transporting survival risk prediction models

Optimal subsampling for the Cox proportional hazards model with massive survival data

Learning from Local to Global - an Efficient Distributed Algorithm for Modeling Time-to-event Data

On the Maximum Penalized Full Likelihood Approach for Cox Model with Extreme Value for Heavily Censored Survival Data

Communication-Efficient Distributed Estimation and Inference for Cox's Model

A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

An Online Updating Approach for Testing the Proportional Hazards Assumption with Streams of Survival Data

The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model

Optimal Cox Regression Subsampling Procedure with Rare Events

Deep Cox Mixtures for Survival Regression

Fitting the Cox proportional hazards model to big data

High-dimensional variable selection for Cox's proportional hazards model

High-Dimensional Sparse Additive Hazards Regression

Online Learning Approach for Survival Analysis