The Dice loss in the context of missing or empty labels: Introducing $Φ$ and $ε$

Sofie Tilborghs,Jeroen Bertels,David Robben,Dirk Vandermeulen,Frederik Maes
DOI: https://doi.org/10.1007/978-3-031-16443-9_51
2022-11-09
Abstract:Albeit the Dice loss is one of the dominant loss functions in medical image segmentation, most research omits a closer look at its derivative, i.e. the real motor of the optimization when using gradient descent. In this paper, we highlight the peculiar action of the Dice loss in the presence of missing or empty labels. First, we formulate a theoretical basis that gives a general description of the Dice loss and its derivative. It turns out that the choice of the reduction dimensions $\Phi$ and the smoothing term $\epsilon$ is non-trivial and greatly influences its behavior. We find and propose heuristic combinations of $\Phi$ and $\epsilon$ that work in a segmentation setting with either missing or empty labels. Second, we empirically validate these findings in a binary and multiclass segmentation setting using two publicly available datasets. We confirm that the choice of $\Phi$ and $\epsilon$ is indeed pivotal. With $\Phi$ chosen such that the reductions happen over a single batch (and class) element and with a negligible $\epsilon$, the Dice loss deals with missing labels naturally and performs similarly compared to recent adaptations specific for missing labels. With $\Phi$ chosen such that the reductions happen over multiple batch elements or with a heuristic value for $\epsilon$, the Dice loss handles empty labels correctly. We believe that this work highlights some essential perspectives and hope that it encourages researchers to better describe their exact implementation of the Dice loss in future work.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problems encountered when using the Dice loss function in medical image segmentation, especially in the case of missing or empty labels. Specifically, the paper focuses on two main issues: 1. **Handling of missing or empty labels**: In many practical applications, the training data set may contain missing labels (i.e., the labels of certain categories do not exist in some samples) or empty labels (i.e., the labels of certain categories do not exist in the entire data set). These problems will cause abnormalities in the gradient calculation during the optimization process, thus affecting the performance of the model. 2. **Configuration of the Dice loss function**: The paper explores the impact of the selection of two key parameters in the Dice loss function - **dimension reduction \(\Phi\)** and **smoothing term \(\epsilon\)** on the performance of the model. The selection of these two parameters is crucial for handling missing or empty labels, but existing research has paid less attention to this. ### Main contributions of the paper 1. **Theoretical analysis**: - The paper first theoretically analyzes the behavior of the Dice loss function and its derivative in the case of missing or empty labels. The author points out that the selection of dimension reduction \(\Phi\) and smoothing term \(\epsilon\) is non - trivial and crucial. - The author proposes several heuristic combinations of \(\Phi\) and \(\epsilon\), which perform well in handling missing or empty labels. 2. **Experimental verification**: - The author conducts experiments using two public data sets in binary - classification and multi - classification segmentation tasks to verify the correctness of the theoretical analysis. - The experimental results show that by reasonably selecting \(\Phi\) and \(\epsilon\), the Dice loss function can effectively handle missing or empty labels, and in some cases is even superior to the loss functions specifically designed for missing labels. ### Key formulas - **Dice Similarity Coefficient (DSC)**: \[ \text{DSC}(Y_\phi, \tilde{Y}_\phi)=\frac{2|Y_\phi\cap\tilde{Y}_\phi|}{|Y_\phi| + |\tilde{Y}_\phi|} \] - **Smoothed Dice Loss (DL)**: \[ \text{DL}(Y, \tilde{Y}) = 1-\frac{1}{|\Phi|}\sum_{\phi\in\Phi}\frac{2\sum_{\phi\in\phi}y_\phi\tilde{y}_\phi+\epsilon}{\sum_{\phi\in\phi}(y_\phi+\tilde{y}_\phi)+\epsilon} \] - **Derivative of Dice Loss**: \[ \frac{\partial\text{DL}(Y, \tilde{Y})}{\partial\tilde{y}_\omega}=-\frac{1}{|\Phi|}\left(\frac{2y_\omega\sum_{\phi\in\phi_\omega}(y_\phi+\tilde{y}_\phi)+\epsilon - 2\sum_{\phi\in\phi_\omega}y_\phi\tilde{y}_\phi+\epsilon}{\left(\sum_{\phi\in\phi_\omega}(y_\phi+\tilde{y}_\phi)+\epsilon\right)^2}\right) \] ### Conclusion Through theoretical analysis and experimental verification, the paper shows the importance of reasonably selecting the dimension reduction \(\Phi\) and smoothing term \(\epsilon\) of the Dice loss function when handling missing or empty labels. This provides a valuable reference for future research and encourages...