Abstract:Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages. However, few researches have focused on the scenario where the source-language labeled data is also limited in some specific domains. A common approach for this scenario is to generate more training data through translation or generation-based data augmentation method. Unfortunately, we find that simply combining source-language data and the corresponding translation cannot fully exploit the translated data and the improvements obtained are somewhat limited. In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data. Specifically, based on the source-language samples and their translations, we design two contrastive objectives for cross-language NER at different grammatical levels, namely Translation Contrastive Learning (TCL) to close sentence representations between translated sentence pairs and Label Contrastive Learning (LCL) to close token representations within the same labels. Furthermore, we utilize knowledge distillation method where the NER model trained above is used as the teacher to train a student model on unlabeled target-language data to better fit the target language. We conduct extensive experiments on a wide variety of target languages, and the results demonstrate that ConCNER tends to outperform multiple baseline methods. For reproducibility, our code for this paper is available at <a class="link-external link-https" href="https://github.com/GKLMIP/ConCNER" rel="external noopener nofollow">this https URL</a>.

Building Low-Resource NER Models Using Non-Speaker Annotation

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Low-Resource Named Entity Recognition Without Human Annotation

A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Improving Low Resource Named Entity Recognition Using Cross-lingual Knowledge Transfer

A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

Low-Resource Adaptation of Neural NLP Models

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition

A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity Recognition

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Low-Resource Named Entity Recognition via the Pre-Training Model.

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains

Augmenting NER Datasets with LLMs: Towards Automated and Refined Annotation

Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields

Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

Low-Resource NER by Data Augmentation with Prompting