CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

Chao-Chun Hsu,Erin Bransom,Jenna Sparks,Bailey Kuehl,Chenhao Tan,David Wadden,Lucy Lu Wang,Aakanksha Naik
2024-07-23
Abstract:Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical categories and every node is linked to the studies assigned to that category. Our naive LLM-based pipeline for hierarchy generation from a set of studies produces promising yet imperfect hierarchies, motivating us to collect CHIME, an expert-curated dataset for this task focused on biomedicine. Given the challenging and time-consuming nature of building hierarchies from scratch, we use a human-in-the-loop process in which experts correct errors (both links between categories and study assignment) in LLM-generated hierarchies. CHIME contains 2,174 LLM-generated hierarchies covering 472 topics, and expert-corrected hierarchies for a subset of 100 topics. Expert corrections allow us to quantify LLM performance, and we find that while they are quite good at generating and organizing categories, their assignment of studies to categories could be improved. We attempt to train a corrector model with human feedback which improves study assignment by 12.6 F1 points. We release our dataset and models to encourage research on developing better assistive tools for literature review.
Computation and Language
What problem does this paper attempt to address?
The paper primarily discusses how to utilize Large Language Models (LLMs) to assist in constructing a hierarchical organization of scientific literature to support the literature review process. Specifically, the paper addresses the following issues: 1. **Challenges of Literature Overview**: With the rapid increase in the number of scientific publications, completing a literature overview has become increasingly time-consuming and challenging. For example, in the medical field, it takes an average of 67 weeks from registration to publication of a review article. 2. **Limitations of Automated Literature Review Tools**: Although existing tools are mostly focused on automating the generation of literature reviews, treating it as a multi-document summarization task, their effectiveness is limited. Studies have found that domain experts prefer to use assistive rather than fully automatic literature review tools. 3. **Building Hierarchical Organization Structures**: The paper proposes a method that uses LLMs to generate a hierarchical organization of scientific research literature, similar to a tree structure, where nodes represent topic categories, and each node links to a list of literature belonging to that category. This method is designed to assist researchers in conducting literature reviews. 4. **Evaluating and Improving LLMs Performance**: The paper collected an expert-curated dataset named CHIME to evaluate the capability of LLMs in generating hierarchical organizations and trained a "corrector" model through human feedback to automatically correct errors in the hierarchical structures generated by LLMs, thereby improving the accuracy of literature categorization. 5. **Human Involvement in the Correction Process**: Since building such a hierarchy from scratch is very difficult and time-consuming, the paper developed a human-computer collaboration protocol where experts correct errors in the preliminary hierarchy generated by LLMs, including incorrect links between categories and misclassification of literature. In summary, the goal of the paper is to explore the potential of LLMs in generating hierarchical organization of literature, while improving their performance through an expert-curated dataset and a corrector model, with the ultimate aim of developing better assistive tools to enhance the efficiency and quality of literature reviews.