FastSurvival: Hidden Computational Blessings in Training Cox Proportional Hazards Models

Jiachang Liu,Rui Zhang,Cynthia Rudin
2024-10-25
Abstract:Survival analysis is an important research topic with applications in healthcare, business, and manufacturing. One essential tool in this area is the Cox proportional hazards (CPH) model, which is widely used for its interpretability, flexibility, and predictive performance. However, for modern data science challenges such as high dimensionality (both $n$ and $p$) and high feature correlations, current algorithms to train the CPH model have drawbacks, preventing us from using the CPH model at its full potential. The root cause is that the current algorithms, based on the Newton method, have trouble converging due to vanishing second order derivatives when outside the local region of the minimizer. To circumvent this problem, we propose new optimization methods by constructing and minimizing surrogate functions that exploit hidden mathematical structures of the CPH model. Our new methods are easy to implement and ensure monotonic loss decrease and global convergence. Empirically, we verify the computational efficiency of our methods. As a direct application, we show how our optimization methods can be used to solve the cardinality-constrained CPH problem, producing very sparse high-quality models that were not previously practical to construct. We list several extensions that our breakthrough enables, including optimization opportunities, theoretical questions on CPH's mathematical structure, as well as other CPH-related applications.
Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the training problems of the Cox Proportional Hazards (CPH) model in modern data - science challenges. Specifically, it makes improvements in the following aspects: 1. **High - dimensional data and highly correlated features**: With the increase in sample size and feature space and the high correlation among features, the existing CPH model training algorithms encounter difficulties in convergence and insufficient precision. These problems prevent the CPH model from fully realizing its potential. 2. **Limitations of existing optimization methods**: The current optimization algorithms based on Newton's method have high computational complexity when dealing with high - dimensional data, and due to vanishing second - order derivatives, these algorithms are difficult to converge when far from the local minimum area, resulting in the loss function may diverge or converge very slowly. 3. **Trade - off between accuracy and efficiency**: The existing optimization methods are either computationally expensive but with high accuracy (such as the exact Newton method), or have low computational cost but slow convergence speed and low accuracy (such as the quasi - Newton method and the proximal Newton method). This trade - off limits the performance of the CPH model in practical applications. To solve the above problems, the paper proposes a new optimization method that utilizes the hidden mathematical structure in the CPH model by constructing and minimizing surrogate functions. This method not only ensures monotonic loss decrease and global convergence but also improves computational efficiency, thus being able to effectively handle the problems of high - dimensional data and highly correlated features. In addition, this method can also be used to solve CPH problems with cardinality constraints, generating very sparse high - quality models, which is difficult to achieve with previous methods. ### Specific contributions - **Discovering the defects of existing optimization algorithms**: Pointing out the slow convergence and low - precision problems of current optimization algorithms when dealing with high - dimensional data and highly correlated features. - **Proposing a new optimization algorithm**: By minimizing quadratic and cubic surrogate functions, a new optimization method is proposed, which ensures loss decrease in each iteration and global convergence. - **Efficiently calculating partial derivatives**: Discovering the hidden mathematical structure in the CPH loss function, being able to accurately calculate the first - order, second - order, and third - order partial derivatives within the time complexity of O(n), and proving that these partial derivatives are Lipschitz continuous. - **Application expansion**: Demonstrating the application of the new method in variable selection, regularization, and constraint problems, especially its superior performance in highly correlated feature scenarios. Through these improvements, the paper provides a more efficient and accurate solution for the training of the CPH model, bringing an important methodological breakthrough in the field of survival analysis.