A Chinese-English parallel corpus for information extraction

Hao-tian HUI,Yun-jian LI,Long-hua QIAN,Guo-dong ZHOU
DOI: https://doi.org/10.3969/j.issn.1007-130X.2015.12.021
2015-01-01
Abstract:In addition to machine translation,parallel corpora play an important role in information retrieval,information extraction and knowledge acquisition,etc.However,traditional parallel corpora are aligned at sentence level,thus their significance for research on cross-language natural language processing is limited.In view of this,on the basis of the OntoNotes,we construct a high quality Chinese and English parallel corpus for information extraction by combining automatic extraction,automatic mapping and manual annotation.The corpus contains the entities and their mutual relations,and achieves the alignment between Chinese and English both on entity and relation levels.This corpus therefore can facilitate comparative study of information extraction in Chinese and English,reveal the difference of semantic expressions between languages,and also provide a valuable platform for research on cross-language information extraction.
What problem does this paper attempt to address?