Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review

Girish Sundaram,Daniel Berleant
2023-07-31
Abstract:Objectives: An SLR is presented focusing on text mining based automation of SLR creation. The present review identifies the objectives of the automation studies and the aspects of those steps that were automated. In so doing, the various ML techniques used, challenges, limitations and scope of further research are explained. Methods: Accessible published literature studies that primarily focus on automation of study selection, study quality assessment, data extraction and data synthesis portions of SLR. Twenty-nine studies were analyzed. Results: This review identifies the objectives of the automation studies, steps within the study selection, study quality assessment, data extraction and data synthesis portions that were automated, the various ML techniques used, challenges, limitations and scope of further research. Discussion: We describe uses of NLP/TM techniques to support increased automation of systematic literature reviews. This area has attracted increase attention in the last decade due to significant gaps in the applicability of TM to automate steps in the SLR process. There are significant gaps in the application of TM and related automation techniques in the areas of data extraction, monitoring, quality assessment and data synthesis. There is thus a need for continued progress in this area, and this is expected to ultimately significantly facilitate the construction of systematic literature reviews.
Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the automation of various stages in the Systematic Literature Reviews (SLR) process, particularly the steps of study selection, study quality assessment, data extraction, and data synthesis. Specifically, the paper aims to achieve these goals through natural language processing (NLP) and text mining techniques, and it conducts a systematic review of existing methods to identify challenges, limitations, and areas for further research in the automation process. - **Research Motivation**: The paper points out that the current SLR process is time-consuming and costly, especially in the areas of study selection, data extraction, and synthesis, which involve a significant amount of manual work. This necessitates the support of automation technologies to improve efficiency. - **Main Focus**: The paper focuses on the automation of specific stages in the SLR process, including study selection (SLR6), data extraction and monitoring (SLR8), and data synthesis (SLR9). - **Technical Means**: The paper mentions the use of various machine learning (ML) techniques and text mining methods, such as Support Vector Machine (SVM), logistic regression, Naive Bayes, and random forest algorithms. - **Evaluation Method**: Cross-validation is used as the primary evaluation method in the study, along with a series of evaluation metrics to measure model performance. - **Research Results**: Through the analysis of 29 related studies, the paper highlights the progress and shortcomings in SLR automation and emphasizes future research directions. Overall, the paper aims to provide more efficient and reliable solutions for the key steps in the SLR process by systematically reviewing existing automation methods and technologies.