Conceptual review of outcome metrics and measures used in clinical evaluation of artificial intelligence in radiology

Seong Ho Park,Kyunghwa Han,June-Goo Lee
DOI: https://doi.org/10.1007/s11547-024-01886-9
2024-09-03
Abstract:Artificial intelligence (AI) has numerous applications in radiology. Clinical research studies to evaluate the AI models are also diverse. Consequently, diverse outcome metrics and measures are employed in the clinical evaluation of AI, presenting a challenge for clinical radiologists. This review aims to provide conceptually intuitive explanations of the outcome metrics and measures that are most frequently used in clinical research, specifically tailored for clinicians. While we briefly discuss performance metrics for AI models in binary classification, detection, or segmentation tasks, our primary focus is on less frequently addressed topics in published literature. These include metrics and measures for evaluating multiclass classification; those for evaluating generative AI models, such as models used in image generation or modification and large language models; and outcome measures beyond performance metrics, including patient-centered outcome measures. Our explanations aim to guide clinicians in the appropriate use of these metrics and measures.
What problem does this paper attempt to address?