Estimating the Prevalence of Generative AI Use in Medical School Application Essays

Ian S Hagemann,Valerie S Ratts,Nicholas C Spies
DOI: https://doi.org/10.1101/2024.10.21.24315868
2024-10-22
Abstract:Background: Generative artificial intelligence (AI) tools became widely available to the public in November 2022. The extent to which these tools are being used by aspiring medical school applicants during the admissions process is unknown. Methods: We retrospectively analyzed 6,000 essays submitted to a U.S. medical school in 2021-2022 (baseline, before wide availability of AI) and in 2023-2024 (test year) to estimate the prevalence of AI use and its relation to other application data. We used GPTZero, a commercially available detection tool, to generate a metric for the likelihood that each essay was human-generated, P_human, ranging from 0 (entirely AI) to 1 (entirely human). Results: Fully human-generated negative controls demonstrated a median P_human of 0.93, while AI-generated positive controls demonstrated a median P_human of 0.01. Personal Comments essays submitted in the 23-24 cycle had a median human-generated score of 0.77 (95% confidence interval 0.76-0.78), versus 0.83 (95% CI 0.82-0.85) during the 21-22 cycle. Approximately 12.3 and 2.7% of essays were evaluated as having P_human < 0.5 in the test and baseline year, respectively. Secondary essays demonstrated lower P_human than Personal Comments essays, suggesting more AI use. In multivariate analysis, younger age, visa requirement, and higher GPA were significantly associated with lower P_human. No differences were observed in gender, MCAT score, undergraduate major, or socioeconomic status. P_human was not predictive of admissions outcomes in uni- or multivariate analyses. Conclusions: An AI detection algorithm estimated significantly increased use of generative AI in 2023-2024 medical school admission applications, as compared to the 2021-2022 baseline. Estimated AI use demonstrated no significant differences in admissions decisions. While these results provide information about the applicant pool as a whole, AI detection is imperfect. We recommended exercising caution before deploying any AI detection tools on individual applications in live admissions cycles.
Medical Education
What problem does this paper attempt to address?