Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

Xingming Liao,Nankai Lin,Haowen Li,Lianglun Cheng,Zhuowei Wang,Chong Chen

2024-06-19

Abstract:Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.

Computation and Language

What problem does this paper attempt to address?

The paper focuses on the problem of Nested Named Entity Recognition (NNER), which is a more complex task compared to Flat Named Entity Recognition (FNER) because it involves identifying overlapping entities. One of the challenges faced by NNER is the lack of sufficient annotated data. To address this issue, the paper proposes the Composited-Nested-Learning (CNL) method, which combines data augmentation with a structure called Composited-Nested-Label Classification (CNLC) to better handle nested entities. CNLC allows a word to have multiple labels, thus overcoming the limitations of existing data augmentation techniques that cannot be directly applied to NNER. The paper also introduces a selection mechanism called Confidence Filtering Mechanism (CFM) to choose high-confidence samples from the generated data, aiming to improve the quality of data augmentation. Experimental results demonstrate that this approach improves model performance on the ACE2004 and ACE2005 datasets and mitigates the impact of sample imbalance. The main contributions of the paper are as follows: 1. Using CNLC to handle nested words and labels in NNER, addressing the NNER problem through data augmentation. 2. Proposing CFM to select high-confidence samples, enhancing the quality of augmented data. 3. Improving existing model performance and alleviating sample imbalance issues through the framework. 4. Releasing the augmented dataset as open-source for other researchers to use. Additionally, the paper discusses the limitations of existing data augmentation methods in NNER and compares them with other NER models, providing evidence of the effectiveness of the proposed approach.

Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

A Boundary-aware Neural Model for Nested Named Entity Recognition.

A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition

Candidate region aware nested named entity recognition

Nested Named Entity Recognition Via an Independent-Layered Pretrained Model

Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes

Structure and Label Constrained Data Augmentation for Cross-domain Few-shot NER

Hierarchical Region Learning for Nested Named Entity Recognition.

LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition.

Recognizing Nested Named Entity Based on the Neural Network Boundary Assembling Model

Data Augmentation for Cross-Domain Named Entity Recognition

A Chinese Nested Named Entity Recognition Approach Using Sequence Labeling.

Cascaded Models for Better Fine-Grained Named Entity Recognition

Nested Entity Recognition Method Based on Multidimensional Features and Fuzzy Localization

AERNs: Attention-Based Entity Region Networks for Multi-Grained Named Entity Recognition

A Unified MRC Framework for Named Entity Recognition

3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition

Recognizing Nested Entities from Flat Supervision: A New NER Subtask, Feasibility and Challenges

ASTRAL: Adversarial Trained LSTM-CNN for Named Entity Recognition

Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition

Context-Aware Attentive Multilevel Feature Fusion for Named Entity Recognition.