Abstract:This study addresses critical gaps in automated lymphoma segmentation from PET/CT images, focusing on issues often overlooked in existing literature. While deep learning has been applied for lymphoma lesion segmentation, few studies incorporate out-of-distribution testing, raising concerns about model generalizability across diverse imaging conditions and patient populations. We highlight the need to compare model performance with expert human annotators, including intra- and inter-observer variability, to understand task difficulty better. Most approaches focus on overall segmentation accuracy but overlook lesion-specific metrics important for precise lesion detection and disease <a class="link-external link-http" href="http://quantification.To" rel="external noopener nofollow">this http URL</a> address these gaps, we propose a clinically-relevant framework for evaluating deep neural networks. Using this lesion-specific evaluation, we assess the performance of four deep segmentation networks (ResUNet, SegResNet, DynUNet, and SwinUNETR) across 611 cases from multi-institutional datasets, covering various lymphoma subtypes and lesion characteristics. Beyond standard metrics like the Dice similarity coefficient (DSC), we evaluate clinical lesion measures and their prediction errors. We also introduce detection criteria for lesion localization and propose a new detection Criterion 3 based on metabolic characteristics. We show that networks perform better on large, intense lesions with higher metabolic <a class="link-external link-http" href="http://activity.Finally" rel="external noopener nofollow">this http URL</a>, we compare network performance to expert human observers via intra- and inter-observer variability analyses, demonstrating that network errors closely resemble those made by experts. Some small, faint lesions remain challenging for both humans and networks. This study aims to improve automated lesion segmentation's clinical relevance, supporting better treatment decisions for lymphoma patients. The code is available at: <a class="link-external link-https" href="https://github.com/microsoft/lymphoma-segmentation-dnn" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in the automatic segmentation and quantification process of lymphoma lesions in PET/CT images. Specifically, the research focuses on solving the following problems: 1. **Insufficient model generalization ability**: - **External validation and out - of - distribution testing**: Most of the existing studies lack testing on external or out - of - distribution datasets, which makes the generalization ability of the model in different imaging conditions and patient groups questionable. - **Application of multi - institutional datasets**: In order to improve the generalization ability of the model, this study uses datasets from multiple institutions for external validation. 2. **Lack of comprehensive comparison with expert annotations**: - **Analysis of human observer variability**: Existing studies rarely comprehensively compare the performance of deep learning models with that of expert human annotators, especially in analyzing the variability between internal and external observers. This makes it difficult to evaluate the true clinical application value of the model. 3. **Ignoring lesion - specific indicators**: - **Overall segmentation accuracy vs. lesion - specific indicators**: Most existing methods focus on overall segmentation accuracy (such as Dice similarity coefficient), while ignoring lesion - specific indicators (such as lesion size, metabolic activity, etc.) that reflect clinical needs. These lesion - specific indicators are crucial for accurate disease detection and quantification. 4. **Lack of clinical relevance**: - **Clinical lesion measurement standards**: The study proposes a strict clinically - relevant framework for evaluating the performance of deep neural networks to ensure that the model output can be aligned with the actual diagnostic requirements and enhance clinical relevance. ### Method overview To solve the above problems, the research adopts the following methods: - **Multi - institutional datasets**: Utilize PET/CT image data of 611 cases from four different institutions, covering different lymphoma subtypes and lesion characteristics. - **Four commonly - used deep segmentation networks**: Evaluate the performance of four commonly - used deep segmentation networks, namely ResUNet, SegResNet, DynUNet and SwinUNETR. - **Comprehensive evaluation framework**: Not only use standard segmentation indicators (such as Dice similarity coefficient), but also introduce clinical lesion measurement standards, calculate prediction errors, and analyze the relationship between DSC performance and lesion measurement. - **Detection standards**: Propose three detection standards (Criterion 1, 2, 3) to evaluate the performance of the network in identifying and locating lesions, especially for lesion segmentation based on metabolic characteristics. - **Comparison with expert annotations**: Through the analysis of internal and external observer variability, compare the network performance with expert human annotators and show the similarity between network errors and human expert errors. ### Conclusion Through extensive analysis, the research shows that: - Deep learning networks show better performance when dealing with large and metabolically active lesions. - The error patterns of the network are very similar to those of human experts. - Small and weak lesions are challenging even for expert physicians and are difficult to be segmented consistently. In summary, this study aims to achieve more consistent and clinically - relevant automatic lesion segmentation, support robust decision - making in lymphoma treatment and management, and can be easily extended to other deep learning networks. The code has been publicly shared to promote reproducibility and further research progress.

Comprehensive framework for evaluation of deep neural networks in detection and quantification of lymphoma from PET/CT images: clinical insights, pitfalls, and observer agreement analyses

Deep Reinforcement Learning for Weakly-Supervised Lymph Node Segmentation in CT Images

Convolutional neural network with a hybrid loss function for fully automated segmentation of lymphoma lesions in FDG PET images

Semi-supervised learning towards automated segmentation of PET images with limited annotations: Application to lymphoma patients

A cascaded deep network for automated tumor detection and segmentation in clinical PET imaging of diffuse large B-cell lymphoma

Deep learning for [18F]fluorodeoxyglucose-PET-CT classification in patients with lymphoma: a dual-centre retrospective analysis

Automatic detection and segmentation of lesions in 18F-FDG PET/CT imaging of patients with Hodgkin lymphoma using 3D dense U-Net

PSR-Nets: Deep neural networks with prior shift regularization for PET/CT based automatic, accurate, and calibrated whole-body lymphoma segmentation

Evaluation of mediastinal lymph node segmentation of heterogeneous CT data with full and weak supervision

Deep PET/CT fusion with Dempster-Shafer theory for lymphoma segmentation

Deep learning for automatic tumour segmentation in PET/CT images of patients with head and neck cancers

Generalized Dice Focal Loss trained 3D Residual UNet for Automated Lesion Segmentation in Whole-Body FDG PET/CT Images

Automated Segmentation of Lymph Nodes on Neck CT Scans Using Deep Learning

TMTV-Net: fully automated total metabolic tumor volume segmentation in lymphoma PET/CT images - a multi-center generalizability analysis

3D Lymphoma Segmentation on PET/CT Images Via Multi-Scale Information Fusion with Cross-Attention

Automated Lung Cancer Segmentation Using a Dual-Modality Deep Learning Network with PET and CT Images

Head and neck tumor segmentation convolutional neural network robust to missing PET/CT modalities using channel dropout

Comparison of deep learning networks for fully automated head and neck tumor delineation on multi-centric PET/CT images

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

Deep learning system for lymph node quantification and metastatic cancer identification from whole-slide pathology images

Lesion segmentation on 18F-fluciclovine PET/CT images using deep learning