Bilingual Parallel Active Learning between Chinese and English

Longhua Qian, Jiaxin Liu, Guodong Zhou, Qiaoming Zhu
DOI: https://doi.org/10.1007/978-3-319-50496-4_10
2016-01-01
Abstract:Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive performance. Previous studies on active learning are focused on corpora in one single language or two languages translated from each other. This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an instance-level parallel Chinese and English corpus adapted from OntoNotes is augmented for relation extraction and both the seeds and jointly selected unlabeled instances at each iteration are parallel between two languages in order to enhance active learning. Experimental results on the task of relation classification on the corpus demonstrate that BPAL can significantly outperform monolingual active learning. Moreover, the success of BPAL suggests a new way of annotating parallel corpora for NLP tasks in order to induce two high-performance classifiers in two languages respectively.
What problem does this paper attempt to address?