Simplifying Scholarly Abstracts for Accessible Digital Libraries

Haining Wang,Jason Clark

2024-08-08

Abstract:Standing at the forefront of knowledge dissemination, digital libraries curate vast collections of scientific literature. However, these scholarly writings are often laden with jargon and tailored for domain experts rather than the general public. As librarians, we strive to offer services to a diverse audience, including those with lower reading levels. To extend our services beyond mere access, we propose fine-tuning a language model to rewrite scholarly abstracts into more comprehensible versions, thereby making scholarly literature more accessible when requested. We began by introducing a corpus specifically designed for training models to simplify scholarly abstracts. This corpus consists of over three thousand pairs of abstracts and significance statements from diverse disciplines. We then fine-tuned four language models using this corpus. The outputs from the models were subsequently examined both quantitatively for accessibility and semantic coherence, and qualitatively for language quality, faithfulness, and completeness. Our findings show that the resulting models can improve readability by over three grade levels, while maintaining fidelity to the original content. Although commercial state-of-the-art models still hold an edge, our models are much more compact, can be deployed locally in an affordable manner, and alleviate the privacy concerns associated with using commercial models. We envision this work as a step toward more inclusive and accessible libraries, improving our services for young readers and those without a college degree.

Computation and Language,Artificial Intelligence,Computers and Society,Digital Libraries

What problem does this paper attempt to address?

The problem this paper attempts to address is improving the readability of academic abstracts so that more people, including readers with lower reading levels, can understand complex scientific literature. Specifically, the authors propose a method to simplify academic abstracts by fine-tuning language models, making them more accessible while preserving their original meaning. Additionally, the authors introduce a dataset specifically designed for training simplification models and fine-tune and evaluate four different language models to verify the effectiveness of this approach. The research results show that these models can significantly improve the readability of abstracts while maintaining content fidelity. Compared to commercial models, these models are smaller in size and can be deployed locally, thus addressing privacy and cost issues. The ultimate goal is to provide improved abstracts in digital libraries to enhance information retrieval effectiveness and enable more people to understand the content of scientific research.

Simplifying Scholarly Abstracts for Accessible Digital Libraries

Amplifying Scientific Paper's Abstract by Leveraging Data-Weighted Reconstruction

Qlarify: Recursively Expandable Abstracts for Directed Information Retrieval over Scientific Papers

Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning

Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts

Society of Medical Simplifiers

Improving accessibility of scientific research by artificial intelligence—An example for lay abstract generation

Biomedical text readability after hypernym substitution with fine-tuned large language models

Know Your Audience: The benefits and pitfalls of generating plain language summaries beyond the "general" audience

Automated Lay Language Summarization of Biomedical Scientific Reviews

Readability Controllable Biomedical Document Summarization

Text Simplification of Scientific Texts for Non-Expert Readers

Structuralizing biomedical abstracts with discriminative linguistic features

Large Language Models for Biomedical Text Simplification: Promising But Not There Yet

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

Leveraging artificial intelligence to summarize abstracts in lay language for increasing research accessibility and transparency

Exploring Large Language Models to generate Easy to Read content

Paragraph-level Simplification of Medical Texts

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding