Topic-Aware Modeling for Unsupervised Extractive Summarization

Zhihao Fan,Huiyong Li,Shasha Mo,Jianwei Niu
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191124
2023-01-01
Abstract:The recent success of extractive summarization depends on the availability of large-scale annotated datasets. Existing unsupervised approaches are mostly directed graph based by combining location information with centrality computing. These methods tend to generate summaries with two problems, one is low topic coverage of the source document called the facet bias problem, and the other is continuous position distribution of extracted sentences called the position bias problem. To solve these problems, we propose the topic-aware centrality-based summarization method (TACSUM). Specifically, we employ clustering techniques to explicitly model the topics of the document and define the metrics for topic consistency and topic coverage to improve the performance of summarization. The metric topic consistency is used to guide the calculation of centrality, which solves the position bias problem and achieves a more general effect in different scenarios. We combine the metric topic coverage with the centrality to enhance the topic awareness of the model, which ensures the selected sentences are important and diverse. Numerical experimental results on four datasets show that our method outperforms previous unsupervised methods, especially in long document domains. Extensive analyses confirm that our method can generate high-quality summaries by eliminating position bias and facet bias problems.
What problem does this paper attempt to address?