E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition

Zhen Zhang,Mengting Hu,Shiwan Zhao,Minlie Huang,Haotian Wang,Lemao Liu,Zhirui Zhang,Zhe Liu,Bingzhe Wu
2023-05-29
Abstract:Most named entity recognition (NER) systems focus on improving model performance, ignoring the need to quantify model uncertainty, which is critical to the reliability of NER systems in open environments. Evidential deep learning (EDL) has recently been proposed as a promising solution to explicitly model predictive uncertainty for classification tasks. However, directly applying EDL to NER applications faces two challenges, i.e., the problems of sparse entities and OOV/OOD entities in NER tasks. To address these challenges, we propose a trustworthy NER framework named E-NER by introducing two uncertainty-guided loss terms to the conventional EDL, along with a series of uncertainty-guided training strategies. Experiments show that E-NER can be applied to multiple NER paradigms to obtain accurate uncertainty estimation. Furthermore, compared to state-of-the-art baselines, the proposed method achieves a better OOV/OOD detection performance and better generalization ability on OOV entities.
Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that the current Named Entity Recognition (NER) systems lack the quantification of model uncertainty in an open environment, which affects the reliability of the systems. Specifically: 1. **Model Reliability**: Most of the existing NER systems focus on improving model performance (such as recognition accuracy and F1 - score), while ignoring the quantification of model uncertainty. Uncertainty estimation is crucial for the reliability of NER systems in an open environment. 2. **Sparse Entity Problem**: In text corpora, entities account for only a small number. For example, in the commonly - used CoNLL2003 dataset, only 16.8% of the words belong to entities, and the remaining non - entity types are marked as the "Other" (O) class. This imbalance can lead to over - fitting and poor performance on entity types. 3. **OOV/OOD Entity Differentiation Problem**: In an open environment, NER training/test data usually contains unseen words (OOV) or out - of - domain (OOD) entities. However, the optimization objectives of current EDL methods lack explicit modeling of this information. To address these problems, the authors propose a credible NER framework, E - NER, which improves the traditional EDL method by introducing two uncertainty - based loss terms and a series of uncertainty - based training strategies. Specific improvements include: - **Introducing Uncertainty - Guided Importance Weighting (IW) Loss**: Assign greater weights to samples with higher prediction uncertainty, making the model training pay more attention to entities of interest (such as person names and locations). - **Introducing an Additional Regularization Term**: Penalize samples by assigning higher uncertainty to labels that are more likely to be wrong, thereby improving the ability to detect OOV/OOD entities. Through these improvements, E - NER not only improves the quality of uncertainty estimation but also enhances the robustness and generalization ability for OOV/OOD entities. Experimental results show that E - NER can obtain more accurate uncertainty estimates in multiple NER paradigms and outperforms existing methods in OOV/OOD detection and sample efficiency. ### Formula Summary 1. **Dirichlet Distribution Parameter Calculation**: \[ \alpha^{(i)} = e^{(i)} + 1 \] where \(e^{(i)}\) is the evidence vector and \(\alpha^{(i)}\) is the parameter of the Dirichlet distribution. 2. **Dirichlet Distribution Probability Density Function**: \[ \text{Dir}(p^{(i)}|\alpha^{(i)})=\frac{1}{B(\alpha^{(i)})}\prod_{c = 1}^C p_c^{\alpha_c^{(i)} - 1} \] where \(B(\alpha^{(i)})\) is the multi - dimensional Beta function. 3. **Belief Mass and Uncertainty Mass Calculation**: \[ b_c^{(i)}=\frac{e_c^{(i)}}{S^{(i)}}, \quad u^{(i)}=\frac{C}{S^{(i)}} \] where \(S^{(i)}=\sum_{c = 1}^C\alpha_c^{(i)}\). 4. **Importance - Weighted Classification Loss**: \[ L_{\text{IW}}^{(i)}=\sum_{c = 1}^C w_c^{(i)}(\psi(S^{(i)}) - \psi(\alpha_c^{(i)})) \] where \(w_c^{(i)}=(1 - b_c^{(i)})\odot y_c^{(i)}\). 5. **Uncertainty - Mass Optimization Loss**: \[ L_{\text{UNM}}=-\lambda_2\sum_{i\in\{\hat{y}^{(i)}\neq y^{(i)}\}}\log(u^{(i)}) \] where \(\lambda_2 = \la