Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

Stefan M. Fischer,Johannes Kiechle,Daniel M. Lang,Jan C. Peeken,Julia A. Schnabel
DOI: https://doi.org/10.59275/j.melba.2024-8g8b
2024-06-20
Abstract:Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological, lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes. The code and model for the challenge submission are available at \url{<a class="link-external link-https" href="https://gitlab.lrz.de/compai/MediastinalLymphNodeSegmentation" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of segmenting and quantifying mediastinal lymph nodes (especially pathological mediastinal lymph nodes) in chest CT images. Specifically, the paper focuses on how to handle incompletely annotated datasets, that is, **weakly - supervised learning tasks**, to improve the segmentation performance of the model. #### Specific problem description: 1. **Accurate segmentation of pathological lymph nodes**: - The evaluation of the pathological state of lymph nodes is crucial for cancer diagnosis, disease progression assessment, and treatment planning. Traditional methods based on single - slice measurement have limitations and cannot fully capture abnormal conditions. - Accurate three - dimensional segmentation can evaluate lymph node diseases more accurately and reduce the differences between different observers. 2. **Weakly - supervised learning challenges**: - Since manual annotation of lymph nodes is time - consuming and expensive, only some lymph nodes are annotated in the dataset provided by the MICCAI 2023 Lymph Node Quantification Challenge. This results in incomplete annotation of training data, forming a weakly - supervised learning task. - How to effectively use these incompletely annotated data to train deep - learning models is one of the core problems in this paper. 3. **Generalization ability of small - sized pathological lymph nodes**: - The paper points out that models trained only with clinically enlarged lymph nodes cannot generalize well to smaller pathological lymph nodes. Therefore, how to improve the segmentation performance of the model for lymph nodes of different sizes is also a key issue. #### Solution overview: - **Multiple weakly - supervised learning strategies**: including noise - label training, loss masks, foreground instance coating, and TotalSegmentator pseudo - labels and other methods. - **Introduction of additional public datasets**: By integrating multiple publicly available CT datasets (such as TCIA Lymph Nodes, NSCLC - Radiomics, etc.), the amount of training data is increased to improve the robustness and generalization ability of the model. - **Optimization of pre - processing steps**: By using techniques such as ROI cropping and TotalSegmentator pseudo - labels, the number of unannotated pixels is reduced, and the proportion of annotated pixels is increased. #### Experimental results: - The finally submitted model achieved a Dice coefficient of 0.628 and an average symmetric surface distance of 5.8 mm on the test set, and won the third place in the MICCAI 2023 LNQ challenge. - Experiments show that combining all visible lymph nodes (including non - pathological ones) can significantly improve the segmentation performance of pathological lymph nodes, while models trained only with clinically enlarged lymph nodes cannot generalize well to smaller pathological lymph nodes. Through these methods, the paper has successfully solved the problem of pathological lymph node segmentation in a weakly - supervised environment and provided valuable references for future research.