Abstract:This IRB-approved study evaluated the quality of contours auto-generated by two deep learning (DL) contouring algorithms for organs-at-risk (OAR) volumes in head and neck cancers. Eleven consecutive head and neck (HandN) patients treated by Tomotherapy were selected for evaluation. Dose prescriptions ranged from 60-70Gy in 30-35 fractions. Each patient had three sets of OAR volumes generated, one clinically used and drawn by humans (physician and dosimetrist) and two auto-generated with DL contouring solutions, trained using convolutional neural network algorithms in large external datasets. The two DL models used for comparison were a HandN model (DLCExpert, Mirada Medical, Oxford, UK) and a Ua-Net model (DeepVoxel Inc, Irvine, CA). Using human-generated volumes as the ground truth, we evaluated the performance of these two models using 3 spatial overlap based metrics (Dice coefficient, Jaccard index(JAC) and True positive rate sensitivity(TPR)), 2 surface distance metrics (95% Hausdorff distance(HD) and average distance(AD)), and 1 volume matrix (volume similarity index(VS)). Seventeen common OAR structures were evaluated including brachial plexus, brainstem, esophagus, eyes, larynx, lenses, mandible, optical nerves and chiasm, oral cavity, parotids, pharyngeal constrictors (PC), submandibular glands (SMGs), spinal cord and trachea. Both DL models offered a feasible solution to delineate structures from CT images. The Mirada model had only 10 common organs for comparison. As shown in Table 1, both models produced comparable results while the DeepVoxel matched human contour better in most OARs. Different image segmentation metrics showed consistent results. DL contours were most similar to human generated contours for brainstem, esophagus, eyes, larynx, lens, mandible, parotids, SMGs, spinal cord and trachea where Dice, JAC, TPR, HD, AD, VS in DeepVoxel model were 0.80(range 0.68-0.91), 0.68(0.53-0.84), 0.78(0.67-0.94), 5.1(2.1-11.4), 1.9mm(1.0-3.9) and 0.88(0.76-0.97) respectively. Brachial plexus, optical nerves and chiasm, oral cavity and PC still needed improvement, partly due to the differences in organ definition. For example, teeth were included in DeepVoxel's oral cavity but not in Mirada and human-generated contours. Those discrepancies will be corrected in our next DL model. DL auto-generated contours from two different models showed high similarity to human generated ones for a variety of OARs in the head and neck, with potential to be adopted in routine clinical practice. In contrast with atlas-based or active shape model approaches, DL models are capable of producing contours with a high level of clinical acceptance and show promise to be indistinguishable from human generated ones. Table1: Quality of DL generated contours evaluated by various image segmentation metrics using human generated contours as the ground truth. All metrics showed consistent results across the organs. (Truncated)

Quantitative Comparisons of Deep-learning-based and Atlas-based Auto-segmentation of the Intermediate Risk Clinical Target Volume for Nasopharyngeal Carcinoma.

Automatic Segmentation of Organs at Risk for Nasopharyngeal Carcinoma with Smart Segmentation and MIM Atlas

Automated Clinical Target Volume Delineation for Non-Small Cell Lung Cancer Patients Using Deep 3D Networks

Automated Clinical Target Volume Delineation Using Deep 3D Neural Networks in Radiation Therapy of Non-small Cell Lung Cancer

Automated Delineation of Nasopharynx Gross Tumor Volume for Nasopharyngeal Carcinoma by Plain CT Combining Contrast-Enhanced CT Using Deep Learning

Clinical evaluation of deep learning-based automatic clinical target volume segmentation: a single-institution multi-site tumor experience

Evaluation of Deep Learning‐based Auto‐segmentation Algorithms for Delineating Clinical Target Volume and Organs at Risk Involving Data for 125 Cervical Cancer Patients

Clinical evaluation of deep learning-based clinical target volume auto-segmentation algorithm for cervical cancer

Comparative clinical evaluation of atlas and deep-learning-based auto-segmentation of organ structures in liver cancer

Clinical Evaluation of Deep Learning–based Clinical Target Volume Three-Channel Auto-Segmentation Algorithm for Adaptive Radiotherapy in Cervical Cancer

Comparative Clinical Evaluation of Deep-Learning-Based Algorithms in Auto-Segmentation of Organs-At-Risk for Head and Neck Cancers

Commissioning of an atlas-based auto-segmentation software for application in organ contouring of radiotherapy planning

Clinical evaluation on automatic segmentation results of convolutional neural networks in rectal cancer radiotherapy

Comparative study of two different methods for automatic segmentation of organs at risk in head and neck region

An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV Segmentation With Multicenter Blinded Randomized Controlled Validation

Dose-volume-based evaluation of convolutional neural network-based auto-segmentation of thoracic organs at risk

Development and Validation of A Deep Learning Algorithm for Automated Delineation of Primary Tumor for Nasopharyngeal Carcinoma from Multimodal Magnetic Resonance Images

The Tumor Target Segmentation of Nasopharyngeal Cancer in CT Images Based on Deep Learning Methods

The dosimetric impact of deep learning-based auto-segmentation of organs at risk on nasopharyngeal and rectal cancer

Multi-subject atlas-based auto-segmentation reduces interobserver variation and improves dosimetric parameter consistency for organs at risk in nasopharyngeal carcinoma: A multi-institution clinical study

A Deep Learning Based Automatic Segmentation Approach for Anatomical Structures in Intensity Modulation Radiotherapy