Deep Learning-Based Interpretable AI for Prostate T2W MRI Quality Evaluation
Mason J Belue,Yan Mee Law,Jamie Marko,Evrim Turkbey,Ashkan Malayeri,Enis C Yilmaz,Yue Lin,Latrice Johnson,Katie M Merriman,Nathan S Lay,Bradford J Wood,Peter A Pinto,Peter L Choyke,Stephanie A Harmon,Baris Turkbey
DOI: https://doi.org/10.1016/j.acra.2023.09.030
Abstract:Rationale and objectives: Prostate MRI quality is essential in guiding prostate biopsies. However, assessment of MRI quality is subjective with variation. Quality degradation sources exert varying impacts based on the sequence under consideration, such as T2W versus DWI. As a result, employing sequence-specific techniques for quality assessment could yield more advantageous outcomes. This study aims to develop an AI tool that offers a more consistent evaluation of T2W prostate MRI quality, efficiently identifying suboptimal scans while minimizing user bias. Materials and methods: This retrospective study included 1046 patients from three cohorts (ProstateX [n = 347], All-comer in-house [n = 602], enriched bad-quality MRI in-house [n = 97]) scanned between January 2011 and May 2022. An expert reader assigned T2W MRIs a quality score. A train-validation-test split of 70:15:15 was applied, ensuring equal distribution of MRI scanners and protocols across all partitions. T2W quality AI classification model was based on 3D DenseNet121 architecture using MONAI framework. In addition to multiclassification, binary classification was utilized (Classes 0/1 vs. 2). A score of 0 was given to scans considered non-diagnostic or unusable, a score of 1 was given to those with acceptable diagnostic quality with some usability but with some quality distortions present, and a score of 2 was given to those considered optimal diagnostic quality and usability. Partial occlusion sensitivity maps were generated for anatomical correlation. Three body radiologists assessed reproducibility within a subgroup of 60 test cases using weighted Cohen Kappa. Results: The best validation multiclass accuracy of 77.1% (121/157) was achieved during training. In the test dataset, multiclassification accuracy was 73.9% (116/157), whereas binary accuracy was 84.7% (133/157). Sub-class sensitivity for binary quality distortion classification for class 0 was 100% (18/18), and sub-class specificity for T2W classification of absence/minimal quality distortions for class 2 was 90.5% (95/105). All three readers showed moderate to substantial agreement with ground truth (R1-R3 κ = 0.588, κ = 0.649, κ = 0.487, respectively), moderate to substantial agreement with each other (R1-R2 κ = 0.599, R1-R3 κ = 0.612, R2-R3 κ = 0.685), fair to moderate agreement with AI (R1-R3 κ = 0.445, κ = 0.410, κ = 0.292, respectively). AI showed substantial agreement with ground truth (κ = 0.704). 3D quality heatmap evaluation revealed that the most critical non-diagnostic quality imaging features from an AI perspective related to obscuration of the rectoprostatic space (94.4%, 17/18). Conclusion: The 3D AI model can assess T2W prostate MRI quality with moderate accuracy and translate whole sequence-level classification labels into 3D voxel-level quality heatmaps for interpretation. Image quality has a significant downstream impact on ruling out clinically significant cancers. AI may be able to help with reproducible identification of MRI sequences requiring re-acquisition with explainability.