Abstract:Over the last years, utilizing deep learning for the analysis of survival data has become attractive to many researchers. This has led to the advent of numerous network architectures for the prediction of possibly censored time-to-event variables. Unlike networks for cross-sectional data (used e.g., in classification), deep survival networks require the specification of a suitably defined loss function that incorporates typical characteristics of survival data such as censoring and time-dependent features. Here, we provide an in-depth analysis of the cross-entropy loss function, which is a popular loss function for training deep survival networks. For each time point <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.84ex" height="2.009ex" style="vertical-align: -0.338ex;" viewBox="0 -719.6 361.5 865.1" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use></g></svg></span>t, the cross-entropy loss is defined in terms of a binary outcome with levels "event at or before <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.84ex" height="2.009ex" style="vertical-align: -0.338ex;" viewBox="0 -719.6 361.5 865.1" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use></g></svg></span>t" and "event after <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.84ex" height="2.009ex" style="vertical-align: -0.338ex;" viewBox="0 -719.6 361.5 865.1" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-74" x="0" y="0"></use></g></svg></span>t". Using both theoretical and empirical approaches, we show that this definition may result in a high prediction error and a heavy bias in the predicted survival probabilities. To overcome this problem, we analyze an alternative loss function that is derived from the negative log-likelihood function of a discrete time-to-event model. We show that replacing the cross-entropy loss by the negative log-likelihood loss results in much better calibrated prediction rules and also in an improved discriminatory power, as measured by the concordan-e index.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs></svg>

Correlations of Cross-Entropy Loss in Machine Learning

Cross-Entropy Loss Functions: Theoretical Analysis and Applications

A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses

Understanding the Behaviour of the Empirical Cross-Entropy Beyond the Training Distribution

Generalization of Cross-Entropy Loss Function for Image Classification

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

Amended Cross Entropy Cost: Framework For Explicit Diversity Encouragement

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa

CC-Loss: Channel Correlation Loss For Image Classification

Cross Entropy in Deep Learning of Classifiers Is Unnecessary—ISBE Error Is All You Need

Improving Deep Regression with Ordinal Entropy

On the Rényi Cross-Entropy

Bias in Cross-Entropy-Based Training of Deep Survival Networks

Decoupled Kullback-Leibler Divergence Loss

Neural Bregman Divergences for Distance Learning

Avoiding spurious correlations via logit correction

Cut your Losses with Squentropy