RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

Yuelyu Ji,Zhuochun Li,Rui Meng,Sonish Sivarajkumar,Yanshan Wang,Zeshui Yu,Hui Ji,Yushui Han,Hanyu Zeng,Daqing He

2024-06-25

Abstract:This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learning for Readability Control (RLRC) strategy improves readability, making scientific content comprehensible to non-specialists. Evaluations using the publicly accessible PLOS and eLife datasets show that our methods surpass Plain Gemini model, demonstrating a 20% increase in readability scores, a 15% improvement in ROUGE-2 relevance scores, and a 10% enhancement in factual accuracy. The RAG-RLRC-LaySum framework effectively democratizes scientific knowledge, enhancing public engagement with biomedical discoveries.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper primarily addresses the following issues: 1. **Popularization of Complex Biomedical Literature**: Biomedical research often contains a large amount of specialized terminology and technical details, making it difficult for the general reader to understand. Therefore, developing automated plain language summarization systems becomes particularly important, as these systems can transform complex biomedical research into easily understandable language. 2. **Summary Accuracy**: Although existing automated summarization systems have shown great potential, they still have some issues in terms of accuracy, especially regarding factual accuracy. To address this, the framework enhances content simplification by integrating external explanations and ensures the completeness and accuracy of the summary information. 3. **Improving Readability**: Traditional fine-tuning methods often generate summaries that, while highly relevant (e.g., high ROUGE scores), are not actually easy for humans to read. To overcome this limitation, the paper proposes a reward-based method that optimizes summary quality by adjusting the readability of the generated text. Through these improvements, the RAG-RLRC-LaySum framework effectively achieves the popularization of complex biomedical knowledge, increasing public attention to biomedical discoveries. Experimental results show that this framework outperforms existing models on multiple evaluation metrics.

RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

Automated Lay Language Summarization of Biomedical Scientific Reviews

Readability Controllable Biomedical Document Summarization

The Lay Person's Guide to Biomedicine: Orchestrating Large Language Models

Towards a Robust Retrieval-Based Summarization System

LitSumm: Large language models for literature summarisation of non-coding RNAs

WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles

[Isodense acute cerebral hematoma on computerized tomography].

Enhanced Electronic Health Records Text Summarization Using Large Language Models

Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

Generating Summaries with Controllable Readability Levels

Summaformers @ LaySumm 20, LongSumm 20

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

Towards Automatic Generation of Gene Summary

Reply: Early Repolarization: A Risk Factor in Brugada Syndrome.

Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

MedicalSum: A Guided Clinical Abstractive Summarization Model for Generating Medical Reports from Patient-Doctor Conversations

Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles