From technical to understandable: Artificial Intelligence Large Language Models improve the readability of knee radiology reports

James J. Butler,James Puleo,Michael C. Harrington,Jari Dahmen,Andrew J. Rosenbaum,Gino M. M. J. Kerkhoffs,John G. Kennedy
DOI: https://doi.org/10.1002/ksa.12133
2024-03-17
Knee Surgery Sports Traumatology Arthroscopy
Abstract:Purpose The purpose of this study was to evaluate the effectiveness of an Artificial Intelligence‐Large Language Model (AI‐LLM) at improving the readability of knee radiology reports. Methods Reports of 100 knee X‐rays, 100 knee computed tomography (CT) scans and 100 knee magnetic resonance imaging (MRI) scans were retrieved. The following prompt command was inserted into the AI‐LLM: 'Explain this radiology report to a patient in layman's terms in the second person:[Report Text]'. The Flesch–Kincaid reading level (FKRL) score, Flesch reading ease (FRE) score and report length were calculated for the original radiology report and the AI‐LLM generated report. Any 'hallucination' or inaccurate text produced by the AI‐LLM‐generated report was documented. Results Statistically significant improvements in mean FKRL scores in the AI‐LLM generated X‐ray report (12.7 ± 1.0–7.2 ± 0.6), CT report (13.4 ± 1.0–7.5 ± 0.5) and MRI report (13.5 ± 0.9–7.5 ± 0.6) were observed. Statistically significant improvements in mean FRE scores in the AI‐LLM generated X‐ray report (39.5 ± 7.5–76.8 ± 5.1), CT report (27.3 ± 5.9–73.1 ± 5.6) and MRI report (26.8 ± 6.4–73.4 ± 5.0) were observed. Superior FKRL scores and FRE scores were observed in the AI‐LLM‐generated X‐ray report compared to the AI‐LLM‐generated CT report and MRI report, p
surgery,orthopedics,sport sciences
What problem does this paper attempt to address?