Development and comprehensive evaluation of a national DBCG consensus-based auto-segmentation model for lymph node levels in breast cancer radiotherapy

Emma Skarsø Buhl,Ebbe Laugaard Lorenzen,Lasse Refsgaard,Anders Winther Mølby Nielsen,Annette Torbøl Lund Brixen,Else Maae,Hanne Spangsberg Holm,Joachim Schøler,Linh My Hoang Thai,Louise Wichmann Matthiessen,Maja Vestmø Maraldo,Mathias Maximiliano Nielsen,Marianne Besserman Johansen,Marie Louise Milo,Marie Benzon Mogensen,Mette Holck Nielsen,Mette Møller,Maja Sand,Peter Schultz,Sami Aziz-Jowad Al-Rawi,Saskia Esser-Naumann,Sophie Yammeni,Stine Elleberg Petersen,Birgitte Vrou Offersen,Stine Sofia Korreman
DOI: https://doi.org/10.1016/j.radonc.2024.110567
IF: 6.901
2024-10-07
Radiotherapy and Oncology
Abstract:Background and purpose This study aimed at training and validating a multi-institutional deep learning (DL) auto segmentation model for nodal clinical target volume (CTVn) in high-risk breast cancer (BC) patients with both training and validation dataset created with multi-institutional participation, with the overall aim of national clinical implementation in Denmark. Materials and methods A gold standard (GS) dataset and a high-quality training dataset were created by 21 BC delineation experts from all radiotherapy centres in Denmark. The delineations were created according to ESTRO consensus delineation guidelines. Four models were trained: One per laterality and extension of CTVn internal mammary nodes. The DL models were tested quantitatively in their own test-set and in relation to interobserver variation (IOV) in the GS dataset with geometrical metrics, such as the Dice Similarity Coefficient (DSC). A blinded qualitative evaluation was conducted with a national board, presented to both DL and manual delineations. Results A median DSC > 0.7 was found for all, except the CTVn interpectoral node in one of the models. In the qualitative evaluation 'no corrections needed' were acquired for 297 (36 %) in the DL structures and 286 (34 %) for manual delineations. A higher rate of 'major corrections' and 'easier to start from scratch' was found in the manual delineations. The models performed within the IOV of an expert group, with two exceptions. Conclusion DL models were developed on a national consensus cohort and performed on par with the IOV between BC experts and had a comparable or higher clinical acceptance than expert manual delineations.
oncology,radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?