Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls

Farhad Maleki,Katie Ovens,Rajiv Gupta,Caroline Reinhold,Alan Spatz,Reza Forghani
DOI: https://doi.org/10.1148/ryai.220028
2022-12-20
Radiology: Artificial Intelligence
Abstract:Purpose To investigate the impact of the following three methodological pitfalls on model generalizability: (a) violation of the independence assumption, (b) model evaluation with an inappropriate performance indicator or baseline for comparison, and (c) batch effect. Materials and Methods The authors used retrospective CT, histopathologic analysis, and radiography datasets to develop machine learning models with and without the three methodological pitfalls to quantitatively illustrate their effect on model performance and generalizability. F1 score was used to measure performance, and differences in performance between models developed with and without errors were assessed using the Wilcoxon rank sum test when applicable. Results
What problem does this paper attempt to address?