Abstract:Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied. Availability: Code used to produce these summaries can be found here: <a class="link-external link-https" href="https://github.com/RNAcentral/litscan-summarization" rel="external noopener nofollow">this https URL</a> and the dataset of contexts and summaries can be found here: <a class="link-external link-https" href="https://huggingface.co/datasets/RNAcentral/litsumm-v1" rel="external noopener nofollow">this https URL</a>. Summaries are also displayed on the RNA report pages in RNAcentral (<a class="link-external link-https" href="https://rnacentral.org/" rel="external noopener nofollow">this https URL</a>)

Computational Linguistics Literature and Citations Oriented Citation Linkage, Classification and Summarization.

Automatic Text Summarization Based on Latent Semantic Indexing

COVIDSum: A Linguistically Enriched SciBERT-based Summarization Model for COVID-19 Scientific Papers.

Research on automatic text summarization based on latent semantic indexing

Scientific document summarization via citation contextualization and scientific discourse

Generating Extractive Summaries of Scientific Paradigms

SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation

CitationAS: A Summary Generation Tool Based on Clustering of Retrieved Citation Content

CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization.

When Large Language Models Meet Citation: A Survey

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations

Event-based Summarization Method for Scientific Literature

Citation Based Collaborative Summarization of Scientific Publications by a New Sentence Similarity Measure.

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

LitSumm: Large language models for literature summarisation of non-coding RNAs

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

Enhancing Abstractive Summarization of Scientific Papers Using Structure Information

Design and implementation for literature search and impact-based summaries.

Enhancing Scientific Papers Summarization with Citation Graph

Expanding Citations in a Paper by Summarizing References Based on Co-Occurring Terms