Document-Based HITS Model for Multi-document Summarization

Xiaojun Wan
DOI: https://doi.org/10.1007/978-3-540-89197-0_42
2008-01-01
Abstract:The PageRank model has been successfully exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. However, different documents in the set are usually not equally important, and the sentences in an important document are deemed more salient than the sentences in a trivial document. This paper proposes the document-based HITS model (DocHITS) to fully leverage the document-level information by considering documents and sentences as hubs and authorities. Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of our proposed model.
What problem does this paper attempt to address?