Comparing human and AI performance in medical machine learning: An open-source Python library for the statistical analysis of reader study data

McKinney,S. M.
DOI: https://doi.org/10.1101/2022.05.06.22274773
2022-05-08
MedRxiv
Abstract:In seeking to understand the potential effects of artificial intelligence (AI) on the practice of diagnostic medicine, many investigations involve collecting interpretations from several human experts on a common set of cases. In an effort to standardize the process of analyzing the data emerging from such studies, we have released an open-source Python library to perform applicable statistical procedures. The software implements the industry-standard Obuchowski-Rockette-Hillis (ORH) method for multi-reader multi-case (MRMC) studies. The tools can be used to compare a standalone algorithm against a panel of readers, or compare readers operating in two modalities (for example, with and without algorithmic assistance). The software supports both nonequivalence and noninferiority tests. Functions are also provided to simulate reader and model scores, useful for Monte Carlo power analysis. The code is publicly available in our Gitub repository at https://github.com/Google-Health/google-health/tree/master/analysis.
What problem does this paper attempt to address?