An Unsupervised Semantic Sentence Ranking Scheme for Text Documents

Hao Zhang,Jie Wang
DOI: https://doi.org/10.48550/arXiv.2005.02158
2020-04-29
Abstract:This paper presents Semantic SentenceRank (SSR), an unsupervised scheme for automatically ranking sentences in a single document according to their relative importance. In particular, SSR extracts essential words and phrases from a text document, and uses semantic measures to construct, respectively, a semantic phrase graph over phrases and words, and a semantic sentence graph over sentences. It applies two variants of article-structure-biased PageRank to score phrases and words on the first graph and sentences on the second graph. It then combines these scores to generate the final score for each sentence. Finally, SSR solves a multi-objective optimization problem for ranking sentences based on their final scores and topic diversity through semantic subtopic clustering. An implementation of SSR that runs in quadratic time is presented, and it outperforms, on the SummBank benchmarks, each individual judge's ranking and compares favorably with the combined ranking of all judges.
Information Retrieval,Computation and Language,Machine Learning
What problem does this paper attempt to address?