CitationAS: A Summary Generation Tool Based on Clustering of Retrieved Citation Content

Jie Wang,Shutian Ma,Chengzhi Zhang
2017-01-01
Abstract:Usually, if researchers want to understand research status of any field, they need to browse a great number of related academic literatures. Luckily, in order to work more efficiently, automatic documents summarization can be applied for taking a glance at specific scientific topics. In this paper, we focus on summary generation of citation content. An automatic tool named CitationAS is built, whose three core components are clustering algorithms, label generation and important sentences extraction methods. In experiments, we use bisecting Kmeans, Lingo and STC to cluster retrieved citation content. Then Word2Vec, WordNet and combination of them are applied to generate cluster label. Next, we employ two methods, TF-IDF and MMR, to extract important sentences, which are used to generate summaries. Finally, we adopt gold standard to evaluate summaries obtained from CitationAS. According to evaluations, we find the best label generation method for each clustering algorithm. We also discover that combination of Word2Vec and WordNet doesn’t have good performance compared with using them separately on three clustering algorithms. Combination of Ling algorithm, Word2Vec label generation method and TF-IDF sentences extraction approach will acquire the highest summary quality. Conference Topic Text mining and information extraction
What problem does this paper attempt to address?