RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

Yuelyu Ji,Zhuochun Li,Rui Meng,Sonish Sivarajkumar,Yanshan Wang,Zeshui Yu,Hui Ji,Yushui Han,Hanyu Zeng,Daqing He
2024-06-25
Abstract:This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learning for Readability Control (RLRC) strategy improves readability, making scientific content comprehensible to non-specialists. Evaluations using the publicly accessible PLOS and eLife datasets show that our methods surpass Plain Gemini model, demonstrating a 20% increase in readability scores, a 15% improvement in ROUGE-2 relevance scores, and a 10% enhancement in factual accuracy. The RAG-RLRC-LaySum framework effectively democratizes scientific knowledge, enhancing public engagement with biomedical discoveries.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily addresses the following issues: 1. **Popularization of Complex Biomedical Literature**: Biomedical research often contains a large amount of specialized terminology and technical details, making it difficult for the general reader to understand. Therefore, developing automated plain language summarization systems becomes particularly important, as these systems can transform complex biomedical research into easily understandable language. 2. **Summary Accuracy**: Although existing automated summarization systems have shown great potential, they still have some issues in terms of accuracy, especially regarding factual accuracy. To address this, the framework enhances content simplification by integrating external explanations and ensures the completeness and accuracy of the summary information. 3. **Improving Readability**: Traditional fine-tuning methods often generate summaries that, while highly relevant (e.g., high ROUGE scores), are not actually easy for humans to read. To overcome this limitation, the paper proposes a reward-based method that optimizes summary quality by adjusting the readability of the generated text. Through these improvements, the RAG-RLRC-LaySum framework effectively achieves the popularization of complex biomedical knowledge, increasing public attention to biomedical discoveries. Experimental results show that this framework outperforms existing models on multiple evaluation metrics.