Comparison of diagnostic accuracy and utility of artificial intelligence–optimized ACR TI-RADS and original ACR TI-RADS: a multi-center validation study based on 2061 thyroid nodules

Ying Liu,Xiaoxian Li,Cuiju Yan,Longzhong Liu,Ying Liao,Hongyan Zeng,Weijun Huang,Qian Li,Nansheng Tao,Jianhua Zhou
DOI: https://doi.org/10.1007/s00330-022-08827-y
IF: 7.034
2022-05-05
European Radiology
Abstract:Objective To determine if artificial intelligence–based modification of the Thyroid Imaging Reporting Data System (TI-RADS) would be better than the current American College of Radiology (ACR) TI-RADS for risk stratification of thyroid nodules. Methods A total of 2061 thyroid nodules (in 1859 patients) sampled with fine-needle aspiration or operation were retrospectively analyzed between January 2017 and July 2020. Two radiologists blinded to the pathologic diagnosis evaluated nodule features in five ultrasound categories and assigned TI-RADS scores by both ACR TI-RADS and AI TI-RADS. Inter-rater agreement was assessed by asking another two radiologists to score a set of 100 nodules independently. The reference standard was postoperative pathological or cytopathological diagnosis according to the Bethesda system. Inter-rater agreement was determined using intraclass correlation coefficient (ICC). Results AI TI-RADS assigned lower TI-RADS risk levels than ACR TI-RADS ( p < 0.001) and had larger area under receiver operating characteristic curve (0.762 vs. 0.679, p < 0.001). The sensitivities of ACR TI-RADS and AI TI-RADS were similar (86.7% vs. 82.2%, p = 0.052), but specificity was higher with AI TI-RADS (70.2% vs. 49.2%, p < 0.001). AI TI-RADS downgraded 743 (48.63%) benign nodules, indicating that 328 (42.3% of 776 biopsied nodules) unnecessary fine-needle aspirations (FNA) could have been avoided. Inter-rater agreement was better with AI TI-RADS than with ACR TI-RADS (ICC, 0.808 vs. 0.861, p < 0.001). Conclusion AI TI-RADS can achieve meaningful reduction in the number of benign thyroid nodules recommended for biopsy and significantly improve specificity despite a slight decrease in sensitivity. Key Points • AI TI-RADS assigned lower TI-RADS risk levels than ACR TI-RADS, showing similar sensitivity but higher specificity. • Half of the benign nodules can be downgraded of which 42.3% of biopsy nodules avoided unnecessary fine-needle aspiration (FNA). • AI TI-RADS had a better overall inter-rater agreement.
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?