Abstract:Rationale and objectives: Pancreas segmentation accuracy at CT is critical for the identification of pancreatic pathologies and is essential for the development of imaging biomarkers. Our objective was to benchmark the performance of five high-performing pancreas segmentation models across multiple metrics stratified by scan and patient/pancreatic characteristics that may affect segmentation performance. Materials and methods: In this retrospective study, PubMed and ArXiv searches were conducted to identify pancreas segmentation models which were then evaluated on a set of annotated imaging datasets. Results (Dice score, Hausdorff distance [HD], average surface distance [ASD]) were stratified by contrast status and quartiles of peri-pancreatic attenuation (5 mm region around pancreas). Multivariate regression was performed to identify imaging characteristics and biomarkers (n = 9) that were significantly associated with Dice score. Results: Five pancreas segmentation models were identified: Abdomen Atlas [AAUNet, AASwin, trained on 8448 scans], TotalSegmentator [TS, 1204 scans], nnUNetv1 [MSD-nnUNet, 282 scans], and a U-Net based model for predicting diabetes [DM-UNet, 427 scans]. These were evaluated on 352 CT scans (30 females, 25 males, 297 sex unknown; age 58 ± 7 years [ ± 1 SD], 327 age unknown) from 2000-2023. Overall, TS, AAUNet, and AASwin were the best performers, Dice= 80 ± 11%, 79 ± 16%, and 77 ± 18%, respectively (pairwise Sidak test not-significantly different). AASwin and MSD-nnUNet performed worse (for all metrics) on non-contrast scans (vs contrast, P < .001). The worst performer was DM-UNet (Dice=67 ± 16%). All algorithms except TS showed lower Dice scores with increasing peri-pancreatic attenuation (P < .01). Multivariate regression showed non-contrast scans, (P < .001; MSD-nnUNet), smaller pancreatic length (P = .005, MSD-nnUNet), and height (P = .003, DM-UNet) were associated with lower Dice scores. Conclusion: The convolutional neural network-based models trained on a diverse set of scans performed best (TS, AAUnet, and AASwin). TS performed equivalently to AAUnet and AASwin with only 13% of the training set size (8488 vs 1204 scans). Though trained on the same dataset, a transformer network (AASwin) had poorer performance on non-contrast scans whereas its convolutional network counterpart (AAUNet) did not. This study highlights how aggregate assessment metrics of pancreatic segmentation algorithms seen in other literature are not enough to capture differential performance across common patient and scanning characteristics in clinical populations.

Intra-Individual Reproducibility of Automated Abdominal Organ Segmentation-Performance of TotalSegmentator Compared to Human Readers and an Independent nnU-Net Model

Segmentation of liver and spleen based on computational anatomy models

TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

MRISegmentator-Abdomen: A Fully Automated Multi-Organ and Structure Segmentation Tool for T1-weighted Abdominal MRI

TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images

SU‐E‐T‐40: Exploring the Reproducibility of Tumor Volumes Measured by Radiologist, Computer‐Aided Radiologist and Computer Alone

Validation and optimization of multi-organ segmentation on clinical imaging archives

AbdomenCT-1K: Is Abdominal Organ Segmentation A Solved Problem

Children Are Not Small Adults: Addressing Limited Generalizability of an Adult Deep Learning CT Organ Segmentation Model to the Pediatric Population

Fully-automated multi-organ segmentation tool applicable to both non-contrast and post-contrast abdominal CT: deep learning algorithm developed using dual-energy CT images

Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Multi-organ segmentation of abdominal structures from non-contrast and contrast enhanced CT images

Assessing Quantitative Performance and Expert Review of Multiple Deep Learning-Based Frameworks for Computed Tomography-based Abdominal Organ Auto-Segmentation

MRSegmentator: Multi-Modality Segmentation of 40 Classes in MRI and CT

The reliability of virtual non-contrast reconstructions of photon-counting detector CT scans in assessing abdominal organs

Longitudinal Variability Analysis on Low-dose Abdominal CT with Deep Learning-based Segmentation

Deep Learning Auto-Segmentation Network for Paediatric CT Datasets: Can We Extrapolate from Adults?

A Comparison of CT-Based Pancreatic Segmentation Deep Learning Models

AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks

Abdominal organ segmentation via deep diffeomorphic mesh deformations

External validation of a deep learning model for automatic segmentation of skeletal muscle and adipose tissue on abdominal computed tomography images