Abstract:Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.

A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs

Automatic Summarization for Chinese Text Based on Combined Words Recognition and Paragraph Clustering

A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously.

Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization

Topic-based automatic summarization algorithm for Chinese short text

Enhancing diversity and coverage of document summaries through subspace clustering and clustering-based optimization

Combining co-clustering with noise detection for theme-based summarization

Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization

Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization

Research on Multi-document Summarization Using Lexical Cohesion

Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering

Automatic text summarization based on sentences clustering and extraction

Sentences clustering based automatic summarization

A Novel Automatic Summarization Method from Chinese Document

Chinese Text Automatic Summarization Based on Affinity Propagation Cluster.

Towards More Effective Text Summarization Based on Textual Association Networks

A Novel Automatic Text Summarization Study Based on Term Co-Occurrence

Automatic Text Summarization Based on Textual Cohesion

Study on Academic Documents –oriented Automatic Summarization of Short Texts

Multi-document Summarization Based on Lexical Chains

Automatic Summarization Method Based on Thematic Term Set