Discovering Topic Time from Web News

Xujian Zhao,Peiquan Jin,Lihua Yue
DOI: https://doi.org/10.1016/j.ipm.2015.04.001
IF: 7.466
2015-01-01
Information Processing & Management
Abstract:Topic time reflects the temporal feature of topics in Web news pages, which can be used to establish and analyze topic models for many time-sensitive text mining tasks. However, there are two critical challenges in discovering topic time from Web news pages. The first issue is how to normalize different kinds of temporal expressions within a Web news page, e.g., explicit and implicit temporal expressions, into a unified representation framework. The second issue is how to determine the right topic time for topics in Web news. Aiming at solving these two problems, we propose a systematic framework for discovering topic time from Web news. In particular, for the first issue, we propose a new approach that can effectively determine the appropriate referential time for implicit temporal expressions and further present an effective defuzzification algorithm to find the right explanation for a fuzzy temporal expression. For the second issue, we propose a relation model to describe the relationship between news topics and topic time. Based on this model, we design a new algorithm to extract topic time from Web news. We build a prototype system called Topic Time Parser (TTP) and conduct extensive experiments to measure the effectiveness of our proposal. The results suggest that our proposal is effective in both temporal expression normalization and topic time extraction.
What problem does this paper attempt to address?