Causal Interventions-based Few-Shot Named Entity Recognition

Zhen Yang,Yongbin Liu,Chunping Ouyang
2023-05-03
Abstract:Few-shot named entity recognition (NER) systems aims at recognizing new classes of entities based on a few labeled samples. A significant challenge in the few-shot regime is prone to overfitting than the tasks with abundant samples. The heavy overfitting in few-shot learning is mainly led by spurious correlation caused by the few samples selection bias. To alleviate the problem of the spurious correlation in the few-shot NER, in this paper, we propose a causal intervention-based few-shot NER method. Based on the prototypical network, the method intervenes in the context and prototype via backdoor adjustment during training. In particular, intervening in the context of the one-shot scenario is very difficult, so we intervene in the prototype via incremental learning, which can also avoid catastrophic forgetting. Our experiments on different benchmarks show that our approach achieves new state-of-the-art results (achieving up to 29% absolute improvement and 12% on average for all tasks).
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the spurious correlation problem caused by sample selection bias in the few - shot Named Entity Recognition (NER) task. Specifically, when the training data is very limited, the model is prone to over - fit to the spurious associations between specific contexts and entity labels, thus affecting the generalization ability of the model on new data. For example, in a few - shot scenario, the model may wrongly associate "square" with the "animal" category just because there happen to be a few examples in the training set about "pigeons on the square". This spurious correlation is not an inherent feature of the entity category but is caused by sample selection bias. To alleviate this problem, the paper proposes a method based on causal intervention. This method blocks spurious correlations by intervening in the context and prototypes, thereby improving the generalization ability of the model. Specific techniques include: 1. **Context - based causal intervention**: Generate new samples by replacing entities in sentences to reduce the influence of the context on the predicted labels and prevent the model from over - fitting to specific context - label associations. 2. **Prototype - based causal intervention**: In the one - shot task, due to the lack of additional entities for intervention, this method combines the knowledge of the previous stage and the representation of the current stage as the final prototype representation, thereby avoiding catastrophic forgetting and reducing spurious correlations. 3. **Sample re - weighting**: Calculate the contribution weight of each support sample to the query sample, enabling the model to calculate entity prototypes more accurately and further reducing the distribution difference between the source domain and the target domain. Through these methods, the paper has achieved significant performance improvements on different benchmark tests. Especially in the 5 - shot task, the improvement range reaches 11 - 29%, indicating that this method has a significant effect in solving the spurious correlation problem in the few - shot NER task.