Effects of a comprehensive brain computed tomography deep learning model on radiologist detection accuracy
Quinlan D Buchlak,Cyril H M Tang,Jarrel C Y Seah,Andrew Johnson,Xavier Holt,Georgina M Bottrell,Jeffrey B Wardman,Gihan Samarasinghe,Leonardo Dos Santos Pinheiro,Hongze Xia,Hassan K Ahmad,Hung Pham,Jason I Chiang,Nalan Ektas,Michael R Milne,Christopher H Y Chiu,Ben Hachey,Melissa K Ryan,Benjamin P Johnston,Nazanin Esmaili,Christine Bennett,Tony Goldschlager,Jonathan Hall,Duc Tan Vo,Lauren Oakden-Rayner,Jean-Christophe Leveque,Farrokh Farrokhi,Richard G Abramson,Catherine M Jones,Simon Edelstein,Peter Brotchie
DOI: https://doi.org/10.1007/s00330-023-10074-8
Abstract:Objectives: Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. Methods: A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. Results: The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. Conclusions: The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. Clinical relevance statement: This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. Key points: • This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. • The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. • The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.