Abstract:Text summarization creates subset that represents the most important or relevant information in the original content, which effectively reduce information redundancy. Recently neural network method has achieved good results in the task of text summarization both in Chinese and English, but the research of text summarization in low-resource languages is still in the exploratory stage, especially in Tibetan. What???s more, there is no large-scale annotated corpus for text summarization. The lack of dataset severely limits the development of low-resource text summarization. In this case, unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data. In this paper, we propose an unsupervised graph-based Tibetan multi-document summarization method, which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic. Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough. In terms of topic division, we adopt two level clustering methods converting original document into document-level and sentence-level graph, next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering. Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents. Then model sentence clusters into graphs, finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences, higher topic relevance summary is extracted. In order to promote the development of Tibetan text summarization, and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets, this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments. The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods.

GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

Query-focused Multi-document Summarization: Combining a Novel Topic Model with Graph-based Semi-supervised Learning

SEASum: Syntax-Enriched Abstractive Summarization

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

UPER: Boosting Multi-Document Summarization with an Unsupervised Prompt-based Extractor.

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model

Multi-Document Abstractive Summarization Using Chunk-graph and Recurrent Neural Network

GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization

Leveraging Graph to Improve Abstractive Multi-Document Summarization.

Query-oriented unsupervised multi-document summarization via deep learning model

Unsupervised Multi-Granularity Summarization

Towards Unifying Multi-Lingual and Cross-Lingual Summarization

Multi-document Summarization Via Sentence-Level Semantic Analysis and Symmetric Matrix Factorization

Unsupervised Graph-Based Tibetan Multi-Document Summarization

Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Multi-granularity heterogeneous graph attention networks for extractive document summarization

Unsupervised Extractive Summarization with Heterogeneous Graph Embeddings for Chinese Document

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports