Learning to rank for censored survival data

Margaux Luck,Tristan Sylvain,Joseph Paul Cohen,Heloise Cardinal,Andrea Lodi,Yoshua Bengio
DOI: https://doi.org/10.48550/arXiv.1806.01984
2018-06-09
Abstract:Survival analysis is a type of semi-supervised ranking task where the target output (the survival time) is often right-censored. Utilizing this information is a challenge because it is not obvious how to correctly incorporate these censored examples into a model. We study how three categories of loss functions, namely partial likelihood methods, rank methods, and our classification method based on a Wasserstein metric (WM) and the non-parametric Kaplan Meier estimate of the probability density to impute the labels of censored examples, can take advantage of this information. The proposed method allows us to have a model that predict the probability distribution of an event. If a clinician had access to the detailed probability of an event over time this would help in treatment planning. For example, determining if the risk of kidney graft rejection is constant or peaked after some time. Also, we demonstrate that this approach directly optimizes the expected C-index which is the most common evaluation metric for ranking survival models.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively use right - censored data for ranking prediction in survival analysis. Specifically, survival analysis is a semi - supervised ranking task, and the target output (survival time) is often right - censored. This means that for some samples, we only know that the time of the event occurred exceeds a certain known time point, but we don't know the exact occurrence time. The incompleteness of this information poses a challenge to model training because it is not obvious how to correctly incorporate these censored samples into the model. The paper studied three types of loss functions, namely the partial likelihood method, the ranking method, and the classification method based on Wasserstein metric (WM) and the probability density of non - parametric Kaplan - Meier estimation to fill in the labels of censored samples, and explored how they use the censoring information. The proposed method aims to predict the probability distribution of events. If clinicians can obtain the detailed probability of an event changing over time, this will be helpful for the formulation of treatment plans. For example, to determine whether the risk of kidney transplant rejection is constant or reaches a peak after some time. In addition, this study also shows that this method can directly optimize the C - index, which is the most commonly used evaluation metric in survival model ranking. By comparing the performance of the proposed loss functions with a series of common ranking - specific loss functions on multiple reference survival data sets, the paper verified the effectiveness of the new method.