Data Augmentation with Knowledge Graph-to-Text and Virtual Adversary for Specialized-Domain Chinese NER
Zhiguang Wang,Siying Hu,Qiang Lu,Zhiqiang Liu,Tian Wang,Bingbin Zhang
DOI: https://doi.org/10.1109/IJCNN60899.2024.10650306
2024-06-30
Abstract:Chinese Named Entity Recognition (CNER) is extensively researched in general domains, while, in practical engineering applications, it receives more and more attention in specialized fields. However, CNER’s performance in domain-specific areas, such as in petroleum refining and entertainment, remains moderate due to a lack of annotated data. In this paper, we mainly focus on two improvements related to the problem of scarce annotated data. Firstly, we propose a novel data augmentation method named Knowledge Graph Text Alignment with BART (KGTA-BART), which, for the first time, introduces a knowledge graph extracted from structured and semi-structured data, aligns its graphic information with the semantic information of annotated text, and thus generates high-quality text from the knowledge graph using BART model. Expanding the dataset can help the model learn more entity features and improve its effectiveness when annotated data is scarce. Additionally, we develop the CNER model Virtual Adversary with BART (VA-BART), which utilizes BART as an encoder and applies the virtual adversary to CNER. This improves the capture of contextual information in the text when annotation data is scarce and enhances the model’s generalization ability. Experimental results demonstrate that VA-BART method based on KGTA-BART achieves significant improvements over the baselines when applied to domain-specific dataset in Chinese language.
Computer Science