Random forests for survival data: which methods work best and under what conditions?

Matthew Berkowitz,Rachel MacKay Altman,Thomas M. Loughin
DOI: https://doi.org/10.1515/ijb-2023-0056
2024-04-25
The International Journal of Biostatistics
Abstract:Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods' relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.
statistics & probability,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the comparison of the effects of different methods for constructing survival trees and forests in survival data analysis. Specifically, the authors hope to systematically investigate various factors that affect the performance of survival forests through extensive simulation studies, including forest construction methods, censoring rates, sample sizes, response distributions, linear predictor structures, and the presence of correlated or noisy covariates. The goal is to determine which methods are most suitable for point prediction of survival time and estimation of survival functions under different conditions, and to provide recommendations for method selection in specific situations. Through this study, the authors hope to fill the knowledge gap in the existing literature regarding the systematic comparison of the performance of these methods, and to provide practitioners and researchers with technical suggestions to be considered when analyzing survival data with specific properties.