Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

Xingming Liao,Nankai Lin,Haowen Li,Lianglun Cheng,Zhuowei Wang,Chong Chen
2024-06-19
Abstract:Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.
Computation and Language
What problem does this paper attempt to address?
The paper focuses on the problem of Nested Named Entity Recognition (NNER), which is a more complex task compared to Flat Named Entity Recognition (FNER) because it involves identifying overlapping entities. One of the challenges faced by NNER is the lack of sufficient annotated data. To address this issue, the paper proposes the Composited-Nested-Learning (CNL) method, which combines data augmentation with a structure called Composited-Nested-Label Classification (CNLC) to better handle nested entities. CNLC allows a word to have multiple labels, thus overcoming the limitations of existing data augmentation techniques that cannot be directly applied to NNER. The paper also introduces a selection mechanism called Confidence Filtering Mechanism (CFM) to choose high-confidence samples from the generated data, aiming to improve the quality of data augmentation. Experimental results demonstrate that this approach improves model performance on the ACE2004 and ACE2005 datasets and mitigates the impact of sample imbalance. The main contributions of the paper are as follows: 1. Using CNLC to handle nested words and labels in NNER, addressing the NNER problem through data augmentation. 2. Proposing CFM to select high-confidence samples, enhancing the quality of augmented data. 3. Improving existing model performance and alleviating sample imbalance issues through the framework. 4. Releasing the augmented dataset as open-source for other researchers to use. Additionally, the paper discusses the limitations of existing data augmentation methods in NNER and compares them with other NER models, providing evidence of the effectiveness of the proposed approach.