FPBoost: Fully Parametric Gradient Boosting for Survival Analysis

Alberto Archetti,Eugenio Lomurno,Diego Piccinotti,Matteo Matteucci
2024-09-20
Abstract:Survival analysis is a critical tool for analyzing time-to-event data and extracting valuable clinical insights. Recently, numerous machine learning techniques leveraging neural networks and decision trees have been developed for this task. Among these, the most successful approaches often rely on specific assumptions about the shape of the modeled hazard function. These assumptions include proportional hazard, accelerated failure time, or discrete estimation at a predefined set of time points. In this study, we propose a novel paradigm for survival model design based on the weighted sum of individual fully parametric hazard contributions. We build upon well-known ensemble techniques to deliver a novel contribution to the field by applying additive hazard functions, improving over approaches based on survival or cumulative hazard functions. Furthermore, the proposed model, which we call FPBoost, is the first algorithm to directly optimize the survival likelihood via gradient boosting. We evaluated our approach across a diverse set of datasets, comparing it against a variety of state-of-the-art models. The results demonstrate that FPBoost improves risk estimation, according to both concordance and calibration metrics.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations in existing survival analysis models, especially the assumption limitations when dealing with time - to - event data. Specifically: 1. **Assumption Limitations of Existing Models**: - Many existing survival analysis methods rely on specific assumptions, such as the proportional hazards assumption, the accelerated failure time assumption, or discrete estimation at predefined time points. Although these assumptions help simplify the model, they may limit the generalization ability of the model. - Traditional survival models are usually based on simplified expressions (such as partial likelihood) or discrete loss functions, which may result in the model being unable to fully utilize continuous - time information. 2. **Improving Model Flexibility and Accuracy**: - Researchers hope to develop a more flexible survival analysis model with fewer assumptions, which can better capture complex data patterns and provide more accurate risk estimates. - By introducing the fully parameterized gradient boosting method (FPBoost), researchers aim to directly optimize the survival likelihood function, thereby avoiding the use of simplified expressions or discrete loss functions. 3. **Combining the Advantages of Ensemble Learning**: - FPBoost combines the gradient - boosting techniques of ensemble learning and decision trees, using multiple fully parameterized risk functions to model survival data. This method not only improves the model's flexibility but also enhances its generalization ability. 4. **Evaluating Model Performance**: - To verify the effectiveness of FPBoost, researchers conducted experiments on multiple datasets and compared the performance of FPBoost with that of several state - of - the - art survival analysis models (including tree - based and neural - network - based methods). - The experimental results show that FPBoost performs well in both concordance and calibration, and is especially superior to traditional methods when dealing with complex data patterns. In summary, the main objective of this paper is to develop a new survival analysis model (FPBoost) to overcome the assumption limitations of existing models, improve the model's flexibility and accuracy, and prove its superior performance in various application scenarios through experiments.