Evaluation of a deep image-to-image network (DI2IN) auto-segmentation algorithm across a network of cancer centers
Kareem Rayn,Vibhor Gupta,Suneetha Mulinti,Ryan Clark,Anthony Magliari,Suresh Chaudhari,Gokhroo Garima,Sushil Beriwal
DOI: https://doi.org/10.4103/jcrt.jcrt_769_23
2024-04-01
Abstract:Purpose/objective s: Due to manual OAR contouring challenges, various automatic contouring solutions have been introduced. Historically, common clinical auto-segmentation algorithms used were atlas-based, which required maintaining a library of self-made contours. Searching the collection was computationally intensive and could take several minutes to complete. Deep learning approaches have shown significant benefits compared to atlas-based methods in improving segmentation accuracy and efficiency in auto-segmentation algorithms. This work represents the first multi-institutional study to describe and evaluate an AI algorithm for the auto-segmentation of organs at risk (OARs) based on a deep image-to-image network (DI2IN). Materials/methods: The AI-Rad Companion Organs RT (AIRC) algorithm (Siemens Healthineers, Erlangen, Germany) uses a two-step approach for segmentation. In the first step, the target organ region in the optimal input image is extracted using a trained deep reinforcement learning network (DRL), which is then used as input to create the contours in the second step based on DI2IN. The study was initially designed as a prospective single-center evaluation. The automated contours generated by AIRC were evaluated by three experienced board-certified radiation oncologists using a four-point scale where 4 is clinically usable and 1 requires re-contouring. After seeing favorable results in a single-center pilot study, we decided to expand the study to six additional institutions, encompassing eight additional evaluators for a total of 11 physician evaluators across seven institutions. Results: One hundred and fifty-six patients and 1366 contours were prospectively evaluated. The five most commonly contoured organs were the lung (136 contours, average rating = 4.0), spinal cord (106 contours, average rating = 3.1), eye globe (80 contours, average rating = 3.9), lens (77 contours, average rating = 3.9), and optic nerve (75 contours, average rating = 4.0). The average rating per evaluator per contour was 3.6. On average, 124 contours were evaluated by each evaluator. 65% of the contours were rated as 4, and 31% were rated as 3. Only 4% of contours were rated as 1 or 2. Thirty-three organs were evaluated in the study, with 19 structures having a 3.5 or above average rating (ribs, abdominopelvic cavity, skeleton, larynx, lung, aorta, brachial plexus, lens, eye globe, glottis, heart, parotid glands, bladder, kidneys, supraglottic larynx, submandibular glands, esophagus, optic nerve, oral cavity) and the remaining organs having a rating of 3.0 or greater (female breast, proximal femur, seminal vesicles, rectum, sternum, brainstem, prostate, brain, lips, mandible, liver, optic chiasm, spinal cord, spleen). No organ had an average rating below 3. Conclusion: AIRC performed well with greater than 95% of contours accepted by treating physicians with no or minor edits. It supported a fully automated workflow with the potential for time savings and increased standardization with the use of AI-powered algorithms for high-quality OAR contouring.