Can We Have Both Fish and Bear's Paw? Improving Performance, Reliability, and both of them for Relation Extraction under Label Shift

Yu Hong,Zhixu Li,Jianfeng Qu,Jiaqing Liang,Yi Luo,Miyu Zhang,Yanghua Xiao,Wei Wang
DOI: https://doi.org/10.1145/3511808.3557251
2022-01-01
Abstract:Neural Relation Extraction (RE) models need large amounts of labeled data for effective training, which mainly comes from automatically labeling by Distant Supervision (DS). Though fast and easy, the label shift problem inevitably happens, i.e., the label distribution of DS-generated training set is quite different from that of the real world (i.e. test set). According to our observations, label shift not only leads to performance diminishment, but also hinders the reliability of DS-RE models by causing bad confidence estimation. In this paper, we make contributions by answering the following three questions: 1) How to improve performance of DS-RE models under label shift? 2) How to make sure their reliability under label shift? 3) How to improve both performance and reliability for DS-RE models under label shift? To the best of our knowledge, this is the first paper to study the performance as well as reliability of DS-RE models under label shift. Experiment results show significant improvements on two real-world datasets and six popular neural RE models, making a step further towards high-performance and reliable RE system under real-world label-shift conditions.
What problem does this paper attempt to address?