Using Topic Labels for Text Summarization.

Wanqiu Kou,Fang Li,Zhe Ye
DOI: https://doi.org/10.1007/978-3-319-60045-1_46
2017-01-01
Abstract:Multi-document summarization is a difficult natural language processing task. Many extractive summarization methods consist of two steps: extract important concepts of documents and select sentences based on those concepts. In this paper, we introduce a method to use the Latent Dirichlet Allocation (LDA) topic labels as concepts, instead of n-gram or using external resources. Sentences are selected based on these topic labels in order to form a summary. Two selection methods are proposed in the paper. Experiments on DUC2004 dataset has shown that Vector-based methods are better, i.e. map topic labels and sentences to a word vector and a letter trigram vector space to find those sentences which are syntactically and semantically related with the topic labels in order to form a summary. Experiments show that the produced summaries are informative, abstractive and better than the baseline method.
What problem does this paper attempt to address?