Abstract:Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical categories and every node is linked to the studies assigned to that category. Our naive LLM-based pipeline for hierarchy generation from a set of studies produces promising yet imperfect hierarchies, motivating us to collect CHIME, an expert-curated dataset for this task focused on biomedicine. Given the challenging and time-consuming nature of building hierarchies from scratch, we use a human-in-the-loop process in which experts correct errors (both links between categories and study assignment) in LLM-generated hierarchies. CHIME contains 2,174 LLM-generated hierarchies covering 472 topics, and expert-corrected hierarchies for a subset of 100 topics. Expert corrections allow us to quantify LLM performance, and we find that while they are quite good at generating and organizing categories, their assignment of studies to categories could be improved. We attempt to train a corrector model with human feedback which improves study assignment by 12.6 F1 points. We release our dataset and models to encourage research on developing better assistive tools for literature review.

What problem does this paper attempt to address?

The paper primarily discusses how to utilize Large Language Models (LLMs) to assist in constructing a hierarchical organization of scientific literature to support the literature review process. Specifically, the paper addresses the following issues: 1. **Challenges of Literature Overview**: With the rapid increase in the number of scientific publications, completing a literature overview has become increasingly time-consuming and challenging. For example, in the medical field, it takes an average of 67 weeks from registration to publication of a review article. 2. **Limitations of Automated Literature Review Tools**: Although existing tools are mostly focused on automating the generation of literature reviews, treating it as a multi-document summarization task, their effectiveness is limited. Studies have found that domain experts prefer to use assistive rather than fully automatic literature review tools. 3. **Building Hierarchical Organization Structures**: The paper proposes a method that uses LLMs to generate a hierarchical organization of scientific research literature, similar to a tree structure, where nodes represent topic categories, and each node links to a list of literature belonging to that category. This method is designed to assist researchers in conducting literature reviews. 4. **Evaluating and Improving LLMs Performance**: The paper collected an expert-curated dataset named CHIME to evaluate the capability of LLMs in generating hierarchical organizations and trained a "corrector" model through human feedback to automatically correct errors in the hierarchical structures generated by LLMs, thereby improving the accuracy of literature categorization. 5. **Human Involvement in the Correction Process**: Since building such a hierarchy from scratch is very difficult and time-consuming, the paper developed a human-computer collaboration protocol where experts correct errors in the preliminary hierarchy generated by LLMs, including incorrect links between categories and misclassification of literature. In summary, the goal of the paper is to explore the potential of LLMs in generating hierarchical organization of literature, while improving their performance through an expert-curated dataset and a corrector model, with the ultimate aim of developing better assistive tools to enhance the efficiency and quality of literature reviews.

CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation

Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews

SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Hierarchical Catalogue Generation for Literature Review: A Benchmark

LLAssist: Simple Tools for Automating Literature Review Using Large Language Models

Automated Review Generation Method Based on Large Language Models

LitLLM: A Toolkit for Scientific Literature Review

A Hybrid Semi-Automated Workflow for Systematic and Literature Review Processes with Large Language Model Analysis

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

PRISMA-DFLLM: An Extension of PRISMA for Systematic Literature Reviews using Domain-specific Finetuned Large Language Models

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

HierLLM: Hierarchical Large Language Model for Question Recommendation

Large Language Models for Software Engineering: A Systematic Literature Review

Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning

SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

MedDM:LLM-executable clinical guidance tree for clinical decision-making

BibSonomy Meets ChatLLMs for Publication Management: From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs