Abstract:While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction. To cope with these problems, we propose a novel self-training framework, dubbed DesERT. This framework augments the conventional NER predicative pathway to a dual form that effectively adapts the sample-selection process to conform to its innate distributional-bias structure. The other crucial component of DesERT composes a debiased module aiming to enhance the token representations, hence the quality of the pseudo-labels. Extensive experiments are conducted to validate the DesERT. The results show that our framework establishes a new state-of-art performance, it achieves a +2.22% average F1 score improvement on five standardized benchmarking datasets. Lastly, DesERT demonstrates its effectiveness under a new DSNER benchmark where additional distant supervision comes from the ChatGPT model.

Posterior-regularized REINFORCE for Instance Selection in Distant Supervision

Finding Influential Instances for Distantly Supervised Relation Extraction

Distant Supervision for Relation Extraction with Neural Instance Selector.

Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction

Denoising Distant Supervision for Relation Extraction via Instance-Level Adversarial Training.

Exploiting Noisy Data in Distant Supervision Relation Classification

Revisiting Distant Supervision for Relation Extraction

Reinforced Natural Language Inference for Distantly Supervised Relation Classification

Towards Time-Aware Distant Supervision for Relation Extraction

CIL: Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

Best from Top k Versus Top 1: Improving Distant Supervision Relation Extraction with Deep Reinforcement Learning

Relation Extraction Method Combining Clause Level Distant Supervision and Semi-supervised Ensemble Learning

Empower Distantly Supervised Relation Extraction with Collaborative Adversarial Training

DSReg: Using Distant Supervision as a Regularizer

Feature and Instance Joint Selection: A Reinforcement Learning Perspective

Clustering-Augmented Multi-instance Learning for Neural Relation Extraction

Reducing Wrong Labels for Distant Supervision Relation Extraction with Selective Capsule Network.

Bootstrapped Multi-level Distant Supervision for Relation Extraction.

Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation

Denoising Relation Extraction from Document-level Distant Supervision

Debiased and Denoised Entity Recognition from Distant Supervision.