Mining Event Temporal Boundaries from News Corpora Through Evolution Phase Discovery

Liang Kong,Rui Yan,Han Jiang,Yan Zhang,Yan Gao,Li Fu
DOI: https://doi.org/10.1007/978-3-642-23535-1_47
2011-01-01
Abstract:Currently news flood spreads throughout the web. The techniques of Event Detection and Tracking makes it feasible to gather and structure text information into events which are constructed online automatically and updated temporally. Users are usually eager to browse the whole event evolution. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event evolution phases discovery. We introduce a novel and principled model (called EPD), aiming at temporally outlining the entire news development. A news document is usually not atomic but consists of independent news segments related to the same event. Therefore we first employ a latent ingredients extraction method to extract event snippets. Unlike traditional clustering methods, we propose a novel metrics integrating content feature, temporal feature, distribution feature and bursty feature to measure the correlation between snippets along timeline in a specific event. Combined with bursty feature, we introduce a novel method to compute word weight. We employ HAC to group the news snippets into diversified phases. An optimization problem are utilized to decide the number of phases, which makes EPD applied. With our novel evaluation method, empirical experiments on two real datasets show that EPD is effective and outperforms various related algorithms. Automatic event chronicle generated is introduced as a typical application of EPD.
What problem does this paper attempt to address?