Comparison of large language models for citation screening: A protocol for a prospective study

Takehiko Oami,Yohei Okada,Taka-aki Nakada
DOI: https://doi.org/10.1101/2024.06.26.24309513
2024-06-26
Abstract:Background: Systematic reviews require labor-intensive and time-consuming processes. Large language models (LLMs) have been recognized as promising tools for citation screening; however, the performance of LLMs in screening citations remained to be determined yet. This study aims to evaluate the potential of three leading LLMs - GPT-4, Gemini 1.5 Pro, and Claude 3.5 for literature screening. Methods: We will conduct a prospective study comparing the accuracy, efficiency, and cost of literature citation screening using the three LLMs. Each model will perform literature searches for predetermined clinical questions from the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock (J-SSCG). We will measure and compare the time required for citation screening using each method. The sensitivity and specificity of the results from the conventional approach and each LLM-assisted process will be calculated and compared. Additionally, we will assess the total time spent and associated costs for each method to evaluate workload reduction and economic efficiency.
What problem does this paper attempt to address?