Aligning Coordinated Text Streams through Burst Information Network Construction and Decipherment

Tao Ge,Qing Dou,Xiaoman Pan,Heng Ji,Lei Cui,Baobao Chang,Zhifang Sui,Ming Zhou
DOI: https://doi.org/10.48550/arXiv.1609.08237
2016-09-27
Abstract:Aligning coordinated text streams from multiple sources and multiple languages has opened many new research venues on cross-lingual knowledge discovery. In this paper we aim to advance state-of-the-art by: (1). extending coarse-grained topic-level knowledge mining to fine-grained information units such as entities and events; (2). following a novel Data-to-Network-to-Knowledge (D2N2K) paradigm to construct and utilize network structures to capture and propagate reliable evidence. We introduce a novel Burst Information Network (BINet) representation that can display the most important information and illustrate the connections among bursty entities, events and keywords in the corpus. We propose an effective approach to construct and decipher BINets, incorporating novel criteria based on multi-dimensional clues from pronunciation, translation, burst, neighbor and graph topological structure. The experimental results on Chinese and English coordinated text streams show that our approach can accurately decipher the nodes with high confidence in the BINets and that the algorithm can be efficiently run in parallel, which makes it possible to apply it to huge amounts of streaming data for never-ending language and information decipherment.
Computation and Language
What problem does this paper attempt to address?