Summarizing large-scale, multiple-document news data: sparse methods and human validation

Jinzhu Jia,Luke Miratrix,Brian Gawalt,Bin Yu,Laurent El Ghaoui
2013-01-01
Abstract:News media signicantly drives the course of events. Understanding how has long been an active and important area of research. Now, as the amount of online news media available grows, there is even more information calling for analysis, an ever increasing range of inquiry that one might conduct. We believe subject-specic summarization of multiple news documents at once can help. In this paper we adapt scalable statistical techniques to perform this summarization under a predictive framework using a vector space model of documents. We reduce corpora of many millions of words to a few representative key-phrases that describe a specied subject of interest. We propose this as a tool for news media study. We consider the ecacies of four dierent
What problem does this paper attempt to address?