AutoPET Challenge III: Testing the Robustness of Generalized Dice Focal Loss trained 3D Residual UNet for FDG and PSMA Lesion Segmentation from Whole-Body PET/CT Images

Shadab Ahamed
2024-09-16
Abstract:Automated segmentation of cancerous lesions in PET/CT scans is a crucial first step in quantitative image analysis. However, training deep learning models for segmentation with high accuracy is particularly challenging due to the variations in lesion size, shape, and radiotracer uptake. These lesions can appear in different parts of the body, often near healthy organs that also exhibit considerable uptake, making the task even more complex. As a result, creating an effective segmentation model for routine PET/CT image analysis is challenging. In this study, we utilized a 3D Residual UNet model and employed the Generalized Dice Focal Loss function to train the model on the AutoPET Challenge 2024 dataset. We conducted a 5-fold cross-validation and used an average ensembling technique using the models from the five folds. In the preliminary test phase for Task-1, the average ensemble achieved a mean Dice Similarity Coefficient (DSC) of 0.6687, mean false negative volume (FNV) of 10.9522 ml and mean false positive volume (FPV) 2.9684 ml. More details about the algorithm can be found on our GitHub repository: <a class="link-external link-https" href="https://github.com/ahxmeds/autosegnet2024.git" rel="external noopener nofollow">this https URL</a>. The training code has been shared via the repository: <a class="link-external link-https" href="https://github.com/ahxmeds/autopet2024.git" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Medical Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: the challenge of automatically segmenting cancerous lesions in PET/CT images. Specifically, the authors have conducted research on the following issues: 1. **Variations in lesion size, shape, and radiotracer uptake**: - Lesions may appear in different parts of the body and are often close to healthy organs that also show significant uptake, which makes the segmentation task more complex. 2. **Limitations of existing methods**: - Traditional threshold - based methods are often unable to detect lesions with low radiotracer uptake and are prone to generating false positives in areas with high physiological uptake (such as the brain or bladder). - The process of manually segmenting lesions is very time - consuming and is easily affected by differences between different observers. 3. **Limitations of the dataset**: - Most deep - learning models are trained using relatively small private datasets, which limits their generalization ability and slows down their integration in routine clinical applications. To address these challenges, the authors propose an improved method as follows: - **Model architecture**: Use the 3D Residual UNet model, which is a convolutional neural network (CNN) and is particularly suitable for the segmentation task of three - dimensional medical images. - **Loss function**: Introduce the Generalized Dice Focal Loss (GDFL), which combines the Generalized Dice Loss (GDL) and the Focal Loss (FL) to improve robustness to small lesions and class - imbalance problems. - **Dataset**: Utilize the large - scale public dataset provided by the AutoPET Challenge 2024, which contains two types of PET images, FDG and PSMA, covering multiple cancer types (such as lymphoma, lung cancer, melanoma, etc.) as well as some negative control samples. Through these improvements, the authors aim to develop a more accurate and robust automated lesion - segmentation model to support daily clinical decision - making and improve the feasibility of quantitative PET analysis. ### Formula summary - **Generalized Dice Loss (GDL)**: \[ L_{GDL} = 1 - \frac{1}{n_b} \sum_{i = 1}^{n_b} \sum_{l = 0}^{1} \frac{w_i^l \sum_{j = 1}^{N^3} p_{ij}^l g_{ij}^l+\epsilon}{\sum_{l = 0}^{1} w_i^l \sum_{j = 1}^{N^3}(p_{ij}^l + g_{ij}^l)+\eta} \] where \( p_{ij}^l \) and \( g_{ij}^l \) are the values of the \( j \) - th voxel in class \( l \) of the \( i \) - th cropped block in the predicted and ground - truth segmentation masks respectively, \( w_i^l=\frac{1}{(\sum_{j = 1}^{N^3} g_{ij}^l)^2} \), \( N^3 \) represents the total number of voxels in the cropped cubic block, and \( n_b \) is the batch size. - **Focal Loss (FL)**: \[ L_{FL}=-\frac{1}{n_b} \sum_{i = 1}^{n_b} \sum_{l = 0}^{1} \sum_{j = 1}^{N^3} v_l(1 - \sigma(p_{ij}^l))^\gamma g_{ij}^l \log(\sigma(p_{ij}^l)) \] where \( v_0 = 1 \), \( v_1 = 100 \), \( \sigma(x)=\frac{1}{1 + e^{-x}} \), \( \gamma = 2 \). - **Dice Similarity Coefficient (DSC)**: \[ DSC=\frac{2|G\cap P|}{|G|+|P|} \]