Multi-Instance Active Learning for Relation Extraction

Meilin Cao,Jiqiong Jiang,Jing Zhang
DOI: https://doi.org/10.1109/ictai56018.2022.00072
2022-01-01
Abstract:Achieving high-accuracy relation extraction usually requires a large well-annotated corpus for model training, which is extremely time-consuming and labor-intensive. Consequently, distant supervision, implemented as a multi-instance learning paradigm, was proposed to build models from bags using automated labeled corpora, where massive noises are unavoidably introduced in the bag labels, deteriorating the model performance. To deal with the above problem, this paper proposes novel multi-instance active learning for relation extraction, which aims to improve the classification performance of the learned models under the premise of using as few labeled training samples as possible. The proposed active learning strategies for bag selection are based on the similarity between bags, the improved Fisher matrix, and the combination of the two (hybrid), respectively. Extensive experiment results on two real-world datasets under three base learners consistently show that our proposed active learning methods archive better performance for relation extraction, compared with three baseline methods. Furthermore, the hybrid bag selection strategy performs the best.
What problem does this paper attempt to address?