Detecting Reference Errors in Scientific Literature with Large Language Models

Tianmai M. Zhang,Neil F. Abernethy
2024-11-09
Abstract:Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and time-consuming to detect, posing a significant challenge to scientific publishing. To support automatic detection of reference errors, this work evaluated the ability of large language models in OpenAI's GPT family to detect quotation errors. Specifically, we prepared an expert-annotated, general-domain dataset of statement-reference pairs from journal articles. Large language models were evaluated in different settings with varying amounts of reference information provided by retrieval augmentation. Our results showed that large language models are able to detect erroneous citations with limited context and without fine-tuning. This study contributes to the growing literature that seeks to utilize artificial intelligence to assist in the writing, reviewing, and publishing of scientific papers. Potential avenues for further improvements in this task are also discussed.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the issue of detecting citation errors (such as citation mistakes and reference errors) in scientific literature. Citation errors are prevalent in scientific papers and can lead to the dissemination of inaccurate information, thereby affecting the credibility of scientific research and potentially causing serious consequences. However, detecting these errors is often both difficult and time-consuming, requiring expertise to compare statements with relevant information in the references. Therefore, the paper evaluates the performance of large language models (LLMs) in detecting citation errors, particularly their ability to identify reference mistakes. By preparing an expert-annotated general domain dataset, the researchers tested the ability of LLMs to detect errors under different settings with limited citation information. The study results indicate that even without fine-tuning, LLMs can detect citation errors within limited contexts, offering new possibilities for using artificial intelligence to assist in the writing, reviewing, and publishing of scientific papers.