Chinese News Event Corpus Construction Method Based on Syntax Tree

Sun Qingzhi,Du Qingfeng,Zhang Chenxi,Li Jun
DOI: https://doi.org/10.1145/3422713.3422741
2020-01-01
Abstract:At present, the weakly supervised model is usually used for the expansion of the event corpus, which avoids the expensive manual annotation process. However, the weakly supervised model relies on the knowledge base and a small part of manually annotated corpus data, which makes the model have the problems of poor portability. In order to solve this problem, we construct a public domain event extraction model using syntax tree. In this paper, we propose a classification structure of Chinese syntax tree according to the view of event extraction, and put forward an event extraction algorithm for various syntax tree types. Moreover, in the construction algorithm of trigger word dictionary, we use cross-corpus dictionary information to construct Chinese trigger word dictionary from the perspective of semantics. As a result, we obtain 40,128 Chinese news events, which initially constituted the corpus of Chinese new events.
What problem does this paper attempt to address?