[Impression taking].

R. Slavicek

Abstract:

What problem does this paper attempt to address?

Quantitative and Qualitative Evaluation of a Deep Learning Auto Contouring Model for Prostate Cancer Patients with Hydrogel Spacer

S. Zieminski,J. A. Efstathiou,A. L. Zietman,S. C. Kamran,Y. Wang

DOI: https://doi.org/10.1016/j.ijrobp.2020.07.711

2020-01-01

Abstract:To quantitatively and qualitatively evaluate a deep learning auto contouring model for prostate radiotherapy patients with pretreatment insertion of a hydrogel spacer (about water equivalent with no contrast) between prostate and rectum. The model employs convolutional neural networks (CNN) to learn features from input images that can be used to generate semantic segmentation. The study used 163 patients from three specialized GU radiation oncologists (referred to as A/B/C). The first 135 patients (A/B/C = 82/39/14) were used for training (125) and validation (10). The validation patients were randomly selected. The validated model was tested on 28 patients (A/B/C = 18/6/4) accrued during model development. There was no change of practice during the whole period. A simulation CT and MR were taken on the same day for each patient. In manual contouring, with MR fused to CT, spacer was contoured on T2 MR, prostate on CT with MR guidance, and other structures on CT only. The model was trained to auto contour prostate, proximal seminal vesicles (SV), bladder, rectum, penile bulb, femurs and spacer on CT without MR. Quantitatively, auto contours were evaluated against manual contours using the following metrics: sensitivity (% of voxels correctly drawn), false positive rate (FPR, % of voxels overdrawn), dice similarity coefficient (DSC), 95-percentile of Hausdorff distance (HD) and mean distances (dmean) between the two contours over all slices. The structures with high DSC were qualitatively evaluated by the original attending using a 1 (acceptable with minor editing), 2 (editable with efficiency gain over manual contouring) and 3 (rejected for no efficiency gain or gross error) scoring system. A gross error on rectum occurred for two patients (A/B = 1/1). These two points were excluded from quantitative analysis but counted as rejected in qualitative evaluation. On average, DSC was high for femurs (>0.95) and bladder (0.91), moderate for prostate (0.85) and rectum (0.81), but low for bulb (0.67), proximal SV (0.62) and spacer (0.52). For right femur/left femur/bladder/prostate/rectum, sensitivity = 0.93/0.92/0.88/0.86/0.81, FPR = 1.8%/1.5%/4.5%/15%/17%, 95% 95%-HD = 2.8/2.6/12.1/7.4/9.5 mm, and dmean = 0.9/1.0/2.6/2.5/2.4 mm. Qualitatively, femurs scored 1 in all cases. The average scores for bladder/prostate/rectum = 1.28/1.44/1.50, 1.83/2.17/1.67, 1.25/1.50/1.25 for physicians A, B, C, respectively, and 1.39/1.61/1.50 overall. Prostate and rectum both scored well below 2, despite their lower quantitative performance, as some errors caused by the inaccurate prediction of spacer without MR were deemed easily correctable by the physicians. The model produced clinically satisfactory results, both quantitatively and qualitatively, for femurs, bladder, prostate and rectum. The results for proximal SV and bulb were less ideal. The model drew the spacer in the correct location, but could not draw it accurately due to lack of contrast on CT.
Deep learning-based automatic contour quality assurance for auto-segmented abdominal MR-Linac contours

Mohammad Zarenia,Ying Zhang,Christina Sarosiek,Renae Conlin,Asma Amjad,Eric S Paulson

DOI: https://doi.org/10.1088/1361-6560/ad87a6

IF: 3.5

2024-10-17

Physics in Medicine and Biology

Abstract:Objective. Deep-learning auto-segmentation (DLAS) aims to streamline contouring in clinical settings. Nevertheless, achieving clinical acceptance of DLAS remains a hurdle in abdominal MRI, hindering the implementation of efficient clinical workflows for MR-guided online adaptive radiotherapy (MRgOART). Integrating automated contour quality assurance (ACQA) with automatic contour correction (ACC) techniques could optimize the performance of ACC by concentrating on inaccurate contours. Furthermore, ACQA can facilitate the contour selection process from various DLAS tools and/or deformable contour propagation from a prior treatment session. Here, we present the performance of novel DL-based 3D ACQA models for evaluating DLAS contours acquired during MRgOART. Approach. The ACQA model, based on a 3D convolutional neural network (CNN), was trained using pancreas and duodenum contours obtained from a research DLAS tool on abdominal MRIs acquired from a 1.5T MR-Linac. The training dataset contained abdominal MR images, DL contours, and their corresponding quality ratings, from 103 datasets. The quality of DLAS contours was determined using an in-house contour classification tool, which categorizes contours as acceptable or edit-required based on the expected editing effort. The performance of the 3D ACQA model was evaluated using an independent dataset of 34 abdominal MRIs, utilizing confusion matrices for true and predicted classes. Main results. The ACQA predicted 'acceptable' and 'edit-required' contours at 72.2% (91/136) and 83.6% (726/868) accuracy for pancreas, and at 71.2% (79/111) and 89.6% (772/862) for duodenum contours, respectively. The model successfully identified false positive (extra) and false negative (missing) DLAS contours at 93.75% (15/16) and %99.7 (438/439) accuracy for pancreas, and at 95% (57/60) and 98.9% (91/99) for duodenum, respectively. Significance. We developed 3D-ACQA models capable of quickly evaluating the quality of DLAS pancreas and duodenum contours on abdominal MRI. These models can be integrated into clinical workflow, facilitating efficient and consistent contour evaluation process in MRgOART for abdominal malignancies.

engineering, biomedical,radiology, nuclear medicine & medical imaging
A multi-modal vision-language pipeline strategy for contour quality assurance and adaptive optimization

Shunyao Luan,Jun Ou-Yang,Xiaofei Yang,Wei Wei,Xudong Xue,Benpeng Zhu,Jun Ou-yang

DOI: https://doi.org/10.1088/1361-6560/ad2a97

IF: 3.5

2024-02-21

Physics in Medicine and Biology

Abstract:Objective: Accurate delineation of organs-at-risk (OARs) is a critical step in radiotherapy. The deep learning generated segmentations usually need to be reviewed and corrected by oncologists manually, which is time-consuming and operator-dependent. Therefore, an automated quality assurance (QA) and adaptive optimization correction strategy was proposed to identify and optimize "incorrect" auto-segmentations. Approach: A total of 586 CT images and labels from nine institutions were used. The OARs included the brainstem, parotid, and mandible. The deep learning generated contours were compared with the manual ground truth delineations. In this study, we proposed a novel Contour Quality Assurance and Adaptive Optimization (CQA-AO) strategy, which consists of the following three main components: 1) The contour QA module classified the deep learning generated contours as either accepted or unaccepted; 2) The unacceptable contour categories analysis module provided the potential error reasons (five unacceptable category) and locations (attention heatmaps); 3) The adaptive correction of unacceptable contours module integrate vision-language representations and utilize convex optimization algorithms to achieve adaptive correction of "incorrect" contours. Main results: In the contour quality assurance tasks, the sensitivity (accuracy, precision) of CQA-AO strategy reached 0.940 (0.945, 0.948), 0.962 (0.937, 0.913), and 0.967 (0.962, 0.957) for brainstem, parotid and mandible, respectively. The unacceptable contour category analysis, the (F_I,〖Acc〗_I,F_micro,F_macro) of CQA-AO strategy reached (0.901,0.763,0.862,0.822), (0.855,0.737, 0.837, 0.784), and (0.907, 0.762, 0.858, 0.821) for brainstem, parotid and mandible, respectively. After adaptive optimization correction, the DSC values of brainstem, parotid and mandible have been improved by 9.4%, 25.9%, and 13.5%, and HD values decreased by 62%, 70.6%, and 81.6%, respectively. Significance: The proposed CQA-AO strategy, which combines quality assurance of contour and adaptive optimization correction for OARs contouring, demonstrated superior performance compare to conventional methods. This method can be implemented in the clinical contouring procedures and improve the efficiency of delineating and reviewing workflow.

engineering, biomedical,radiology, nuclear medicine & medical imaging
Development and Evaluation of the First Pediatric Deep-Learning Auto-Contouring Models for Cranio-Spinal Irradiation (CSI)

S. Zieminski,S. MacDonald,P. Looney,Y. Wang

DOI: https://doi.org/10.1016/j.ijrobp.2021.07.665

2021-01-01

Abstract:The deep-learning models provided high-quality auto contours, directly acceptable for lungs and requiring only minor editing in less than one min for thecal sac, vertebrae and kidneys. Given the labor-intensive nature of manually contouring thecal sac and vertebrae, the high accuracy of the auto contours resulted in substantial reduction of contouring time and thus faster turnaround for CSI treatment. The models can provide a uniform starting point for CSI contouring in our field, which is particularly beneficial to clinics with less experiences.
Quality Control-Driven Image Segmentation Towards Reliable Automatic Image Analysis in Large-Scale Cardiovascular Magnetic Resonance Aortic Cine Imaging

Evan Hann,Luca Biasiolli,Qiang Zhang,Iulia A. Popescu,Konrad Werys,Elena Lukaschuk,Valentina Carapella,Jose M. Paiva,Nay Aung,Jennifer J. Rayner,Kenneth Fung,Henrike Puchta,Mihir M. Sanghvi,Niall O. Moon,Katharine E. Thomas,Vanessa M. Ferreira,Steffen E. Petersen,Stefan Neubauer,Stefan K. Piechnik

DOI: https://doi.org/10.1007/978-3-030-32245-8_83

2019-01-01

Abstract:Recent progress in fully-automated image segmentation has enabled efficient extraction of clinical parameters in large-scale clinical imaging studies, reducing laborious manual processing. However, the current state-of-the-art automatic image segmentation may still fail, especially when it comes to atypical cases. Visual inspection of segmentation quality is often required, thus diminishing the improvements in efficiency. This drives an increasing need to enhance the overall data processing pipeline with robust automatic quality scoring, especially for clinical applications. We present a novel quality control-driven (QCD) framework to provide reliable segmentation using a set of different neural networks. In contrast to the prior segmentation and quality scoring methods, the proposed framework automatically selects the optimal segmentation on-the-fly from the multiple candidate segmentations available, directly utilizing the inherent Dice similarity coefficient (DSC) predictions. We trained and evaluated the framework on a large-scale cardiovascular magnetic resonance aortic cine image sequences from the UK Biobank Study. The framework achieved segmentation accuracy of mean DSC at 0.966, mean prediction error of DSC within 0.015, and mean error in estimating lumen area ≤17.6 mm2 for both ascending aorta and proximal descending aorta. This novel QCD framework successfully integrates the automatic image segmentation along with detection of critical errors on a per-case basis, paving the way towards reliable fully-automatic extraction of clinical parameters for large-scale imaging studies.
Automatic contouring QA method using a deep learning–based autocontouring system

Dong Joo Rhee,Chidinma P. Anakwenze Akinfenwa,Bastien Rigaud,Anuja Jhingran,Carlos E. Cardenas,Lifei Zhang,Surendra Prajapati,Stephen F. Kry,Kristy K. Brock,Beth M. Beadle,William Shaw,Frederika O'Reilly,Jeannette Parkes,Hester Burger,Nazia Fakie,Chris Trauernicht,Hannah Simonds,Laurence E. Court

DOI: https://doi.org/10.1002/acm2.13647

2022-05-18

Journal of applied clinical medical physics [electronic resource] / American College of Medical Physics

Abstract:Purpose To determine the most accurate similarity metric when using an independent system to verify automatically generated contours. Methods A reference autocontouring system (primary system to create clinical contours) and a verification autocontouring system (secondary system to test the primary contours) were used to generate a pair of 6 female pelvic structures (UteroCervix [uterus + cervix], CTVn [nodal clinical target volume (CTV)], PAN [para‐aortic lymph nodes], bladder, rectum, and kidneys) on 49 CT scans from our institution and 38 from other institutions. Additionally, clinically acceptable and unacceptable contours were manually generated using the 49 internal CT scans. Eleven similarity metrics (volumetric Dice similarity coefficient (DSC), Hausdorff distance, 95% Hausdorff distance, mean surface distance, and surface DSC with tolerances from 1 to 10 mm) were calculated between the reference and the verification autocontours, and between the manually generated and the verification autocontours. A support vector machine (SVM) was used to determine the threshold that separates clinically acceptable and unacceptable contours for each structure. The 11 metrics were investigated individually and in certain combinations. Linear, radial basis function, sigmoid, and polynomial kernels were tested using the combinations of metrics as inputs for the SVM. Results The highest contouring error detection accuracies were 0.91 for the UteroCervix, 0.90 for the CTVn, 0.89 for the PAN, 0.92 for the bladder, 0.95 for the rectum, and 0.97 for the kidneys and were achieved using surface DSCs with a thickness of 1, 2, or 3 mm. The linear kernel was the most accurate and consistent when a combination of metrics was used as an input for the SVM. However, the best model accuracy from the combinations of metrics was not better than the best model accuracy from a surface DSC as an input. Conclusions We distinguished clinically acceptable contours from clinically unacceptable contours with an accuracy higher than 0.9 for the targets and critical structures in patients with cervical cancer; the most accurate similarity metric was surface DSC with a thickness of 1, 2, or 3 mm.
Deep learning for contour quality assurance for RTOG 0933: In-silico evaluation

Evan M Porter,Charles Vu,Ina M Sala,Thomas Guerrero,Zaid A Siddiqui

DOI: https://doi.org/10.1016/j.radonc.2024.110519

2024-08-31

Abstract:Purpose: To validate a CT-based deep learning (DL) hippocampal segmentation model trained on a single-institutional dataset and explore its utility for multi-institutional contour quality assurance (QA). Methods: A DL model was trained to contour hippocampi from a dataset generated by an institutional observer (IO) contouring on brain MRIs from a single-institution cohort. The model was then evaluated on the RTOG 0933 dataset by comparing the treating physician (TP) contours to blinded IO and DL contours using Dice and Haussdorf distance (HD) agreement metrics as well as evaluating differences in dose to hippocampi when TP vs. IO vs. DL contours are used for planning. The specificity and sensitivity of the DL model to capture planning discrepancies was quantified using criteria of HD > 7 mm and Dmax hippocampi > 17 Gy. Results: The DL model showed greater agreement with IO contours compared to TP contours (DL:IO L/R Dice 74 %/73 %, HD 4.86/4.74; DL:TP L/R Dice 62 %/65 %, HD 7.23/6.94, all p < 0.001). Thirty percent of contours and 53 % of dose plans failed QA. The DL model achieved an AUC L/R 0.80/0.79 on the contour QA task via Haussdorff comparison and AUC of 0.91 via Dmax comparison. The false negative rate was 17.2 %/20.5 % (contours) and 5.8 % (dose). False negative cases tended to demonstrate a higher DL:IO Dice agreement (L/R p = 0.42/0.03) and better qualitative visual agreement compared with true positive cases. Conclusion: Our study demonstrates the feasibility of using a single-institutional DL model to perform contour QA on a multi-institutional trial for the task of hippocampal segmentation.
Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study

Robert Robinson,Vanya V. Valindria,Wenjia Bai,Ozan Oktay,Bernhard Kainz,Hideaki Suzuki,Mihir M. Sanghvi,Nay Aung,Jos$é$ Miguel Paiva,Filip Zemrak,Kenneth Fung,Elena Lukaschuk,Aaron M. Lee,Valentina Carapella,Young Jin Kim,Stefan K. Piechnik,Stefan Neubauer,Steffen E. Petersen,Chris Page,Paul M. Matthews,Daniel Rueckert,Ben Glocker

DOI: https://doi.org/10.48550/arXiv.1901.09351

2019-01-27

Computer Vision and Pattern Recognition

Abstract:Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, it's important to be able to automatically detect when a segmentation method fails so as to avoid inclusion of wrong measurements into subsequent analyses which could lead to incorrect conclusions. Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4,800 cardiac magnetic resonance scans. We then apply our method to a large cohort of 7,250 cardiac MRI on which we have performed manual QC. Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4,800 scans for which manual segmentations were available. We mimic real-world application of the method on 7,250 cardiac MRI where we show good agreement between predicted quality metrics and manual visual QC scores. Conclusions: We show that RCA has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.
Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Robert Robinson,Vanya V. Valindria,Wenjia Bai,Ozan Oktay,Bernhard Kainz,Hideaki Suzuki,Mihir M. Sanghvi,Nay Aung,José Miguel Paiva,Filip Zemrak,Kenneth Fung,Elena Lukaschuk,Aaron M. Lee,Valentina Carapella,Young Jin Kim,Stefan K. Piechnik,Stefan Neubauer,Steffen E. Petersen,Chris Page,Paul M. Matthews,Daniel Rueckert,Ben Glocker

DOI: https://doi.org/10.1186/s12968-019-0523-x

IF: 6.4

2019-03-14

Journal of Cardiovascular Magnetic Resonance

Abstract:BackgroundThe trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools such as image segmentation methods are employed to derive quantitative measures or biomarkers for further analyses. Manual inspection and visual QC of each segmentation result is not feasible at large scale. However, it is important to be able to automatically detect when a segmentation method fails in order to avoid inclusion of wrong measurements into subsequent analyses which could otherwise lead to incorrect conclusions.MethodsTo overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4800 cardiovascular magnetic resonance (CMR) scans. We then apply our method to a large cohort of 7250 CMR on which we have performed manual QC.ResultsWe report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using the predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4800 scans for which manual segmentations were available. We mimic real-world application of the method on 7250 CMR where we show good agreement between predicted quality metrics and manual visual QC scores.ConclusionsWe show that Reverse classification accuracy has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.

radiology, nuclear medicine & medical imaging,cardiac & cardiovascular systems
Machine Learning-Based Quality Assurance for Automatic Segmentation of Head-and-Neck Organs-at-Risk in Radiotherapy

Shunyao Luan,Xudong Xue,Changchao Wei,Yi Ding,Benpeng Zhu,Wei Wei

DOI: https://doi.org/10.1177/15330338231157936

2023-02-16

Technology in Cancer Research & Treatment

Abstract:Technology in Cancer Research &Treatment, Volume 22, Issue , January-December 2023. Purpose/Objective(s): With the development of deep learning, more convolutional neural networks (CNNs) are being introduced in automatic segmentation to reduce oncologists' labor requirement. However, it is still challenging for oncologists to spend considerable time evaluating the quality of the contours generated by the CNNs. Besides, all the evaluation criteria, such as Dice Similarity Coefficient (DSC), need a gold standard to assess the quality of the contours. To address these problems, we propose an automatic quality assurance (QA) method using isotropic and anisotropic methods to automatically analyze contour quality without a gold standard. Materials/Methods: We used 196 individuals with 18 different head-and-neck organs-at-risk. The overall process has the following 4 main steps. (1) Use CNN segmentation network to generate a series of contours, then use these contours as organ masks to erode and dilate to generate inner/outer shells for each 2D slice. (2) Thirty-eight radiomics features were extracted from these 2 shells, using the inner/outer shells' radiomics features ratios and DSCs as the input for 12 machine learning models. (3) Using the DSC threshold adaptively classified the passing/un-passing slices. (4) Through 2 different threshold analysis methods quantitatively evaluated the un-passing slices and obtained a series of location information of poor contours. Parts 1-3 were isotropic experiments, and part 4 was the anisotropic method. Result: From the isotropic experiments, almost all the predicted values were close to the labels. Through the anisotropic method, we obtained the contours' location information by assessing the thresholds of the peak-to-peak and area-to-area ratios. Conclusion: The proposed automatic segmentation QA method could predict the segmentation quality qualitatively. Moreover, the method can analyze the location information for un-passing slices.

oncology
A Proof-of-Concept Study of Artificial Intelligence Assisted Contour Revision

Ti Bai,Anjali Balagopal,Michael Dohopolski,Howard E. Morgan,Rafe McBeth,Jun Tan,Mu-Han Lin,David J. Sher,Dan Nguyen,Steve Jiang

DOI: https://doi.org/10.48550/arXiv.2107.13465

2021-07-29

Abstract:Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that requires a clinicians revision, the clinician indicates where a large revision is needed, and a trained deep learning (DL) model takes this input to update the contour. This process repeats until a clinically acceptable contour is achieved. The DL model is designed to minimize the clinicians input at each iteration and to minimize the number of iterations needed to reach acceptance. In this proof-of-concept study, we demonstrated the concept on 2D axial images of three head-and-neck cancer datasets, with the clinicians input at each iteration being one mouse click on the desired location of the contour segment. The performance of the model is quantified with Dice Similarity Coefficient (DSC) and 95th percentile of Hausdorff Distance (HD95). The average DSC/HD95 (mm) of the auto-generated initial contours were 0.82/4.3, 0.73/5.6 and 0.67/11.4 for three datasets, which were improved to 0.91/2.1, 0.86/2.4 and 0.86/4.7 with three mouse clicks, respectively. Each DL-based contour update requires around 20 ms. We proposed a novel AIACR concept that uses DL models to assist clinicians in revising contours in an efficient and effective way, and we demonstrated its feasibility by using 2D axial CT images from three head-and-neck cancer datasets.

Computer Vision and Pattern Recognition,Image and Video Processing
Evaluation of Deep Learning‐based Auto‐segmentation Algorithms for Delineating Clinical Target Volume and Organs at Risk Involving Data for 125 Cervical Cancer Patients

Zhi Wang,Yankui Chang,Zhao Peng,Yin Lv,Weijiong Shi,Fan Wang,Xi Pei,X. George Xu

DOI: https://doi.org/10.1002/acm2.13097

2020-01-01

Journal of Applied Clinical Medical Physics

Abstract:Objective To evaluate the accuracy of a deep learning-based auto-segmentation mode to that of manual contouring by one medical resident, where both entities tried to mimic the delineation "habits" of the same clinical senior physician. Methods This study included 125 cervical cancer patients whose clinical target volumes (CTVs) and organs at risk (OARs) were delineated by the same senior physician. Of these 125 cases, 100 were used for model training and the remaining 25 for model testing. In addition, the medical resident instructed by the senior physician for approximately 8 months delineated the CTVs and OARs for the testing cases. The dice similarity coefficient (DSC) and the Hausdorff Distance (HD) were used to evaluate the delineation accuracy for CTV, bladder, rectum, small intestine, femoral-head-left, and femoral-head-right. Results The DSC values of the auto-segmentation model and manual contouring by the resident were, respectively, 0.86 and 0.83 for the CTV (P < 0.05), 0.91 and 0.91 for the bladder (P > 0.05), 0.88 and 0.84 for the femoral-head-right (P < 0.05), 0.88 and 0.84 for the femoral-head-left (P < 0.05), 0.86 and 0.81 for the small intestine (P < 0.05), and 0.81 and 0.84 for the rectum (P > 0.05). The HD (mm) values were, respectively, 14.84 and 18.37 for the CTV (P < 0.05), 7.82 and 7.63 for the bladder (P > 0.05), 6.18 and 6.75 for the femoral-head-right (P > 0.05), 6.17 and 6.31 for the femoral-head-left (P > 0.05), 22.21 and 26.70 for the small intestine (P > 0.05), and 7.04 and 6.13 for the rectum (P > 0.05). The auto-segmentation model took approximately 2 min to delineate the CTV and OARs while the resident took approximately 90 min to complete the same task. Conclusion The auto-segmentation model was as accurate as the medical resident but with much better efficiency in this study. Furthermore, the auto-segmentation approach offers additional perceivable advantages of being consistent and ever improving when compared with manual approaches.
SegQC: a segmentation network-based framework for multi-metric segmentation quality control and segmentation error detection in volumetric medical images

Bella Specktor-Fadida,Liat Ben-Sira,Dafna Ben-Bashat,Leo Joskowicz

2024-11-12

Abstract:Quality control of structures segmentation in volumetric medical images is important for identifying segmentation errors in clinical practice and for facilitating model development. This paper introduces SegQC, a novel framework for segmentation quality estimation and segmentation error detection. SegQC computes an estimate measure of the quality of a segmentation in volumetric scans and in their individual slices and identifies possible segmentation error regions within a slice. The key components include: 1. SegQC-Net, a deep network that inputs a scan and its segmentation mask and outputs segmentation error probabilities for each voxel in the scan; 2. three new segmentation quality metrics, two overlap metrics and a structure size metric, computed from the segmentation error probabilities; 3. a new method for detecting possible segmentation errors in scan slices computed from the segmentation error probabilities. We introduce a new evaluation scheme to measure segmentation error discrepancies based on an expert radiologist corrections of automatically produced segmentations that yields smaller observer variability and is closer to actual segmentation errors. We demonstrate SegQC on three fetal structures in 198 fetal MRI scans: fetal brain, fetal body and the placenta. To assess the benefits of SegQC, we compare it to the unsupervised Test Time Augmentation (TTA)-based quality estimation. Our studies indicate that SegQC outperforms TTA-based quality estimation in terms of Pearson correlation and MAE for fetal body and fetal brain structures segmentation. Our segmentation error detection method achieved recall and precision rates of 0.77 and 0.48 for fetal body, and 0.74 and 0.55 for fetal brain segmentation error detection respectively. SegQC enhances segmentation metrics estimation for whole scans and individual slices, as well as provides error regions detection.

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
CNN-Based Quality Assurance for Automatic Segmentation of Breast Cancer in Radiotherapy.

Xinyuan Chen,Kuo Men,Bo Chen,Yu Tang,Tao Zhang,Shulian Wang,Yexiong Li,Jianrong Dai

DOI: https://doi.org/10.3389/fonc.2020.00524

IF: 4.7

2020-01-01

Frontiers in Oncology

Abstract:Purpose: More and more automatic segmentation tools are being introduced in routine clinical practice. However, physicians need to spend a considerable amount of time in examining the generated contours slice by slice. This greatly reduces the benefit of the tool's automaticity. In order to overcome this shortcoming, we developed an automatic quality assurance (QA) method for automatic segmentation using convolutional neural networks (CNNs). Materials and Methods: The study cohort comprised 680 patients with early-stage breast cancer who received whole breast radiation. The overall architecture of the automatic QA method for deep learning-based segmentation included the following two main parts: a segmentation CNN model and a QA network that was established based on ResNet-101. The inputs were from computed tomography, segmentation probability maps, and uncertainty maps. Two kinds of Dice similarity coefficient (DSC) outputs were tested. One predicted the DSC quality level of each slice ([0.95, 1] for "good," [0.8, 0.95] for "medium," and [0, 0.8] for "bad" quality), and the other predicted the DSC value of each slice directly. The performances of the method to predict the quality levels were evaluated with quantitative metrics: balanced accuracy, F score, and the area under the receiving operator characteristic curve (AUC). The mean absolute error (MAE) was used to evaluate the DSC value outputs. Results: The proposed methods involved two types of output, both of which achieved promising accuracy in terms of predicting the quality level. For the good, medium, and bad quality level prediction, the balanced accuracy was 0.97, 0.94, and 0.89, respectively; the F score was 0.98, 0.91, and 0.81, respectively; and the AUC was 0.96, 0.93, and 0.88, respectively. For the DSC value prediction, the MAE was 0.06 ± 0.19. The prediction time was approximately 2 s per patient. Conclusions: Our method could predict the segmentation quality automatically. It can provide useful information for physicians regarding further verification and revision of automatic contours. The integration of our method into current automatic segmentation pipelines can improve the efficiency of radiotherapy contouring.
Assessing Quantitative Performance and Expert Review of Multiple Deep Learning-Based Frameworks for Computed Tomography-based Abdominal Organ Auto-Segmentation

Udbhav S Ram,Joel A Pogue,Michael Soike,Neil T Pfister,Rojymon Jacob,Carlos E Cardenas

DOI: https://doi.org/10.1101/2024.10.02.24312658

2024-10-02

Abstract:Segmentation of abdominal organs in clinical oncological workflows is crucial for ensuring effective treatment planning and follow-up. However, manually generated segmentations are time-consuming and labor-intensive in addition to experiencing inter-observer variability. Many deep learning (DL) and Automated Machine Learning (AutoML) frameworks have emerged as a solution to this challenge and show promise in clinical workflows. This study presents a comprehensive evaluation of existing AutoML frameworks (Auto3DSeg, nnU-Net) against a state-of-the-art non-AutoML framework, the Shifted Window U-Net Transformer (SwinUNETR), each trained on the same 122 training images, taken from the Abdominal Multi-Organ Segmentation (AMOS) grand challenge. Frameworks were compared using Dice Similarity Coefficient (DSC), Surface DSC (sDSC) and 95th Percentile Hausdorff Distances (HD95) on an additional 72 holdout-validation images. The perceived clinical viability of 30 auto-contoured test cases were assessed by three physicians in a blinded evaluation. Comparisons show significantly better performance by AutoML methods. nnU-Net (average DSC: 0.924, average sDSC: 0.938, average HD95: 4.26, median Likert: 4.57), Auto3DSeg (average DSC: 0.902, average sDSC: 0.919, average HD95: 8.76, median Likert: 4.49), and SwinUNETR (average DSC: 0.837, average sDSC: 0.844, average HD95: 13.93). AutoML frameworks were quantitatively preferred (13/13 OARs p>0.0.5 in DSC and sDSC, 12/13 OARs p>0.05 in HD95, comparing Auto3DSeg to SwinUNETR, and all OARs p>0.05 in all metrics comparing SwinUNETR to nnU-Net). Qualitatively, nnU-Net was preferred over Auto3DSeg (p=0.0027). The findings suggest that AutoML frameworks offer a significant advantage in the segmentation of abdominal organs, and underscores the potential of AutoML methods to enhance the efficiency of oncological workflows.

Oncology
Macrocyclic compounds as anti-cancer agents: design and synthesis of multi-acting inhibitors against HDAC, FLT3 and JAK2.

Chengqing Ning,Cheng Lu,Liang Hu,Yanjing Bi,Lei Yao,Yujun He,Li-fei Liu,Xiaoyu Liu,Niefang Yu

DOI: https://doi.org/10.1016/j.ejmech.2015.03.034

IF: 7.088

2015-05-05

European Journal of Medicinal Chemistry

Abstract:
Patient-specific daily updated deep learning auto-segmentation for MRI-guided adaptive radiotherapy

Zhenjiang Li,Wei Zhang,Baosheng Li,Jian Zhu,Yinglin Peng,Chengze Li,Jennifer Zhu,Qichao Zhou,Yong Yin

DOI: https://doi.org/10.1016/j.radonc.2022.11.004

Abstract:Background and purpose: Deep Learning (DL) technique has shown great potential but still has limited success in online contouring for MR-guided adaptive radiotherapy (MRgART). This study proposed a patient-specific DL auto-segmentation (DLAS) strategy using the patient's previous images and contours to update the model and improve segmentation accuracy and efficiency for MRgART. Methods and materials: A prototype model was trained for each patient using the first set of MRI and corresponding contours as inputs. The patient-specific model was updated after each fraction with all the available fractional MRIs/contours, and then used to predict the segmentation for the next fraction. During model training, a variant was fitted under consistency constraints, limiting the differences in the volume, length and centroid between the predictions for the latest MRI within a reasonable range. The model performance was evaluated for both organ-at-risks and tumors auto-segmentation for a total of 6 abdominal/pelvic cases (each with at least 8 sets of MRIs/contours) underwent MRgART through Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD95), and was compared with deformable image registration (DIR) and frozen DL model (no updating after pre-training). The contouring time was also recorded and analyzed. Results: The proposed model achieved superior performance with higher mean DSC (0.90, 95 % CI: 0.88-0.95), as compared to DIR (0.63, 95 %CI: 0.59-0.68) and frozen DL models (0.74, 95 % CI: 0.71-0.79). As for tumors, the proposed method yielded a median DSC of 0.95, 95 % CI: 0.94-0.97, and a median HD95 of 1.63 mm, 95 % CI: 1.22 mm-2.06 mm. The contouring time was reduced significantly (p < 0.05) using the proposed method (73.4 ± 6.5 secs) compared to the manual process (12 ∼ 22 mins). The online ART time was reduced to 1650 ± 274 seconds with the proposed method, as compared to 3251.8 ± 447 seconds using the original workflow. Conclusion: The proposed patient-specific DLAS method can significantly improve the segmentation accuracy and efficiency for longitudinal MRIs, thereby facilitating the routine practice of MRgART.
QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction

Peijie Qiu,Satrajit Chakrabarty,Phuc Nguyen,Soumyendu Sekhar Ghosh,Aristeidis Sotiras

2024-12-10

Abstract:Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
The Use of Quantitative Metrics and Machine Learning to Predict Radiologist Interpretations of MRI Image Quality and Artifacts

Lucas McCullum,John Wood,Maria Gule-Monroe,Ho-Ling Anthony Liu,Melissa Chen,Komal Shah,Noah Nathan Chasen,Vinodh Kumar,Ping Hou,Jason Stafford,Caroline Chung,Moiz Ahmad,Christopher Walker,Joshua Yung

2023-11-21

Abstract:A dataset of 3D-GRE and 3D-TSE brain 3T post contrast T1-weighted images as part of a quality improvement project were collected and shown to five neuro-radiologists who evaluated each sequence for both image quality and imaging artifacts. The same scans were processed using the MRQy tool for objective, quantitative image quality metrics. Using the combined radiologist and quantitative metrics dataset, a decision tree classifier with a bagging ensemble approach was trained to predict radiologist assessment using the quantitative metrics. A machine learning model was developed for the following three tasks: (1) determine the best model / performance for each MRI sequence and evaluation metric, (2) determine the best model / performance across all MRI sequences for each evaluation metric, and (3) determine the best general model / performance across all MRI sequences and evaluations. Model performance for imaging artifact was slightly higher than image quality, for example, the final generalized model AUROC for image quality was 0.77 (0.41 - 0.84, 95% CI) while imaging artifact was 0.78 (0.60 - 0.93, 95% CI). Further, it was noted that the generalized model performed slightly better than the individual models (AUROC 0.69 for 3D-GRE image quality, for example), indicating the value in comprehensive training data for these applications. These models could be deployed in the clinic as automatic checks for real-time image acquisition to prevent patient re-scanning requiring another appointment after retrospective radiologist analysis or improve reader confidence in the study. Further work needs to be done to validate the model described here on an external dataset. The results presented here suggest that MRQy could be used as a foundation for quantitative metrics as a surrogate for radiologist assessment.

Medical Physics
Segmentation Quality and Volumetric Accuracy in Medical Imaging

Zheyuan Zhang,Ulas Bagci

2024-05-14

Abstract:Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard. While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement. Clinicians often lack clear benchmarks to gauge the "goodness" of segmentation results based on these metrics. Recognizing the clinical relevance of volumetry, we utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks. Our work integrates theoretical analysis and empirical validation across diverse datasets. We delve into the often-ambiguous relationship between segmentation quality (measured by Dice) and volumetric accuracy in clinical practice. Our findings highlight the critical role of incorporating volumetric prediction accuracy into segmentation evaluation. This approach empowers clinicians with a more nuanced understanding of segmentation performance, ultimately improving the interpretation and utility of these metrics in real-world healthcare settings.

Image and Video Processing,Computer Vision and Pattern Recognition

[Impression taking].

Quantitative and Qualitative Evaluation of a Deep Learning Auto Contouring Model for Prostate Cancer Patients with Hydrogel Spacer

Deep learning-based automatic contour quality assurance for auto-segmented abdominal MR-Linac contours

A multi-modal vision-language pipeline strategy for contour quality assurance and adaptive optimization

Development and Evaluation of the First Pediatric Deep-Learning Auto-Contouring Models for Cranio-Spinal Irradiation (CSI)

Quality Control-Driven Image Segmentation Towards Reliable Automatic Image Analysis in Large-Scale Cardiovascular Magnetic Resonance Aortic Cine Imaging

Automatic contouring QA method using a deep learning–based autocontouring system

Deep learning for contour quality assurance for RTOG 0933: In-silico evaluation

Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Machine Learning-Based Quality Assurance for Automatic Segmentation of Head-and-Neck Organs-at-Risk in Radiotherapy

A Proof-of-Concept Study of Artificial Intelligence Assisted Contour Revision

Evaluation of Deep Learning‐based Auto‐segmentation Algorithms for Delineating Clinical Target Volume and Organs at Risk Involving Data for 125 Cervical Cancer Patients

SegQC: a segmentation network-based framework for multi-metric segmentation quality control and segmentation error detection in volumetric medical images

CNN-Based Quality Assurance for Automatic Segmentation of Breast Cancer in Radiotherapy.

Assessing Quantitative Performance and Expert Review of Multiple Deep Learning-Based Frameworks for Computed Tomography-based Abdominal Organ Auto-Segmentation

Macrocyclic compounds as anti-cancer agents: design and synthesis of multi-acting inhibitors against HDAC, FLT3 and JAK2.

Patient-specific daily updated deep learning auto-segmentation for MRI-guided adaptive radiotherapy

QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction

The Use of Quantitative Metrics and Machine Learning to Predict Radiologist Interpretations of MRI Image Quality and Artifacts

Segmentation Quality and Volumetric Accuracy in Medical Imaging