Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review

Oliver Vinzelberg,Mark David Jenkins,Gordon Morison,David McMinn,Zoe Tieges
2023-03-25
Abstract:Summarisation of research results in plain language is crucial for promoting public understanding of research findings. The use of Natural Language Processing to generate lay summaries has the potential to relieve researchers' workload and bridge the gap between science and society. The aim of this narrative literature review is to describe and compare the different text summarisation approaches used to generate lay summaries. We searched the databases Web of Science, Google Scholar, IEEE Xplore, Association for Computing Machinery Digital Library and arXiv for articles published until 6 May 2022. We included original studies on automatic text summarisation methods to generate lay summaries. We screened 82 articles and included eight relevant papers published between 2020 and 2021, all using the same dataset. The results show that transformer-based methods such as Bidirectional Encoder Representations from Transformers (BERT) and Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS) dominate the landscape of lay text summarisation, with all but one study using these methods. A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective. Furthermore, pre-processing approaches to input text (e.g. applying extractive summarisation) or determining which sections of a text to include, appear critical. Evaluation metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) were used, which do not consider readability. To conclude, automatic lay text summarisation is under-explored. Future research should consider long document lay text summarisation, including clinical trial reports, and the development of evaluation metrics that consider readability of the lay summary.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of the ability and methods of automatic text summarization technology in generating concise and easy-to-understand research summaries for general readers (non-expert readers). Specifically, the study aims to describe and compare different natural language processing (NLP) techniques, especially transformer-based methods, in their application to generating summaries for the general public. The study focuses on the following core issues: 1. **What NLP techniques have been applied in the field of text summarization for the general public?** The study explores different text summarization methods, including extractive, abstractive, and hybrid methods, particularly how effective these methods are in generating summaries for the general public. 2. **How is the performance of text summarization models for the general public evaluated?** The study discusses various metrics used to evaluate the quality of automatic text summarization, such as ROUGE, but points out that these metrics often do not consider the readability of the summaries. 3. **Which methods for text summarization for the general public are the most effective?** According to the research results, hybrid strategies combining extractive and abstractive methods have been proven to be the most effective. Additionally, pre-trained transformer models, such as BERT and PEGASUS, have shown outstanding performance in generating high-quality summaries for the general public. 4. **What are the main challenges and future research directions in the current research field?** The study highlights some limitations in current research, such as the small size of datasets potentially leading to overfitting issues, and the need to develop more evaluation metrics that consider the readability of summaries. Future research should explore the generation of summaries for long documents, including clinical trial reports, and continue to optimize evaluation methods. By discussing these issues, the study hopes to support the effective dissemination of scientific research results to the public, enabling non-experts to understand complex scientific concepts and research findings, thereby enhancing societal awareness and support for scientific research.