Abstract:Objective: Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Methods: Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. Results: We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. Conclusion: The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain.

Clustering cliques for graph-based summarization of the biomedical research literature

Graph-based biomedical text summarization: An itemset mining and sentence clustering approach

Small-world networks for summarization of biomedical articles

Exploring hypergraph-based semi-supervised ranking for query-oriented summarization

Clustering of Medical Publications for Evidence Based Medicine Summarisation

A Graph-Based Biomedical Literature Clustering Approach Utilizing Term's Global and Local Importance Information

Cited References and Medical Subject Headings (MeSH) as Two Different Knowledge Representations: Clustering and Mappings at the Paper Level

Text summarization for pharmaceutical sciences using hierarchical clustering with a weighted evaluation methodology

An Empirical Comparison of the Summarization Power of Graph Clustering Methods

Utilization of global ranking information in GraphBased biomedical literature clustering

Generating Extractive Summaries of Scientific Paradigms

RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

Efficient Semisupervised MEDLINE Document Clustering with MeSH-Semantic and Global-Content Constraints

HyperSum: hypergraph based semi-supervised sentence ranking for query-oriented summarization.

Hierarchical Graph Summarization: Leveraging Hybrid Information through Visible and Invisible Linkage

Enhancing Medline Document Clustering by Incorporating Mesh Semantic Similarity

Alterations of blood glucose homeostasis in critically ill children - hyperglycemia.

Discourse-Aware Unsupervised Summarization of Long Scientific Documents

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

Multi-view multi-objective clustering-based framework for scientific document summarization using citation context