Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews

Lucas Joos,Daniel A. Keim,Maximilian T. Fischer
2024-07-15
Abstract:In academic research, systematic literature reviews are foundational and highly relevant, yet tedious to create due to the high volume of publications and labor-intensive processes involved. Systematic selection of relevant papers through conventional means like keyword-based filtering techniques can sometimes be inadequate, plagued by semantic ambiguities and inconsistent terminology, which can lead to sub-optimal outcomes. To mitigate the required extensive manual filtering, we explore and evaluate the potential of using Large Language Models (LLMs) to enhance the efficiency, speed, and precision of literature review filtering, reducing the amount of manual screening required. By using models as classification agents acting on a structured database only, we prevent common problems inherent in LLMs, such as hallucinations. We evaluate the real-world performance of such a setup during the construction of a recent literature survey paper with initially more than 8.3k potentially relevant articles under consideration and compare this with human performance on the same dataset. Our findings indicate that employing advanced LLMs like GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Flash, or Llama3 with simple prompting can significantly reduce the time required for literature filtering - from usually weeks of manual research to only a few minutes. Simultaneously, we crucially show that false negatives can indeed be controlled through a consensus scheme, achieving recalls >98.8% at or even beyond the typical human error threshold, thereby also providing for more accurate and relevant articles selected. Our research not only demonstrates a substantial improvement in the methodology of literature reviews but also sets the stage for further integration and extensive future applications of responsible AI in academic research practices.
Machine Learning,Digital Libraries,Human-Computer Interaction
What problem does this paper attempt to address?
This paper aims to address the issues present in the process of Systematic Literature Reviews (SLR), particularly the inefficiency and time-consuming nature of current methods when dealing with a large volume of literature. Traditional methods, such as keyword-based filtering techniques, suffer from semantic ambiguity and term inconsistency, leading to suboptimal screening results. The paper introduces Large Language Models (LLMs) to improve this process, utilizing LLMs for automatic literature classification to enhance the efficiency, speed, and accuracy of literature screening. The study finds that using advanced LLMs (such as GPT-4o, Claude 3.5 Sonnet, etc.) can significantly reduce the time required for literature screening through simple prompts and control the false positive rate through a consensus voting mechanism, achieving a high recall rate (>98.8%) that meets or even exceeds human error thresholds. This not only greatly improves the quality and efficiency of literature reviews but also lays the foundation for the responsible application of AI technology in future academic research.