Automated Review Generation Method Based on Large Language Models

Shican Wu,Xiao Ma,Dehui Luo,Lulu Li,Xiangcheng Shi,Xin Chang,Xiaoyun Lin,Ran Luo,Chunlei Pei,Zhi-Jian Zhao,Jinlong Gong
2024-07-30
Abstract:Literature research, vital for scientific advancement, is overwhelmed by the vast ocean of available information. Addressing this, we propose an automated review generation method based on Large Language Models (LLMs) to streamline literature processing and reduce cognitive load. In case study on propane dehydrogenation (PDH) catalysts, our method swiftly generated comprehensive reviews from 343 articles, averaging seconds per article per LLM account. Extended analysis of 1041 articles provided deep insights into catalysts' composition, structure, and performance. Recognizing LLMs' hallucinations, we employed a multi-layered quality control strategy, ensuring our method's reliability and effective hallucination mitigation. Expert verification confirms the accuracy and citation integrity of generated reviews, demonstrating LLM hallucination risks reduced to below 0.5% with over 95% confidence. Released Windows application enables one-click review generation, aiding researchers in tracking advancements and recommending literature. This approach showcases LLMs' role in enhancing scientific research productivity and sets the stage for further exploration.
Computation and Language,Artificial Intelligence,Data Analysis, Statistics and Probability
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issues of low efficiency in literature processing and information overload in scientific research. Specifically, the authors propose an automated review generation method based on large language models (LLMs) to improve literature processing efficiency and reduce cognitive burden. Through this method, researchers can quickly generate comprehensive review articles, thereby better tracking research progress and recommending relevant literature. ### Background and Challenges 1. **Surge in Literature Quantity**: With the rapid increase in scientific literature, researchers find it difficult to effectively process and integrate this information, leading to redundant discoveries and narrow research perspectives. 2. **Limitations of Traditional Methods**: Traditional literature review methods are time-consuming, require a large amount of specialized manpower, and struggle to keep up with the rapid development of research. 3. **Application of Natural Language Processing (NLP)**: Although NLP technology has made some progress in extracting synthesis methods, material properties, and key reaction parameters, these methods are often limited to specific aspects, requiring prior knowledge and programming skills, which are unfriendly to novices. 4. **Potential and Challenges of LLMs**: Large language models (LLMs) excel in zero-shot and few-shot learning, commonsense reasoning, and multi-task processing, but they have the problem of "hallucination," generating incorrect or irrelevant information. ### Solution 1. **Automated Review Generation Method**: The authors developed an efficient and comprehensive automated review generation method based on LLMs, including an end-to-end pipeline for literature retrieval, reading, summary extraction, and coherent text organization. 2. **Multi-level Quality Control Strategy**: To address the "hallucination" problem of LLMs, the authors adopted a multi-level quality control strategy, including text format filtering, DOI verification, relevance verification, and self-consistency verification, ensuring the generated review content is accurate and reliable. 3. **User-friendly Interface**: A Windows application was developed, allowing users to generate reviews with a single click, without the need for programming skills or domain knowledge. ### Experiments and Results 1. **Case Study**: Using propane dehydrogenation (PDH) catalysts as an example, this method quickly generated a comprehensive review from 343 articles, with an average processing time of only a few seconds per article. 2. **Data Mining and Visualization Analysis**: Through in-depth analysis of 1041 articles, profound insights into catalyst composition, structure, and performance were provided. 3. **Hallucination Mitigation Effect**: Expert validation confirmed that the generated review content was accurate, with complete citations, and the hallucination risk was reduced to below 0.5%, with a confidence level exceeding 95%. ### Conclusion This method not only improves the efficiency and quality of literature processing but also promotes the discovery of new knowledge and innovation, which is of great significance for the development of contemporary scientific research. As an innovative literature processing tool, it is expected to become an important part of scientific research infrastructure, significantly advancing scientific progress.