Abstract:Natural language understanding (NLU) is the task of semantic decoding of human languages by machines. NLU models rely heavily on large training data to ensure good performance. However, substantial languages and domains have very few data resources and domain experts. It is necessary to overcome the data scarcity challenge, when very few or even zero training samples are available. In this thesis, we focus on developing cross-lingual and cross-domain methods to tackle the low-resource issues. First, we propose to improve the model's cross-lingual ability by focusing on the task-related keywords, enhancing the model's robustness and regularizing the representations. We find that the representations for low-resource languages can be easily and greatly improved by focusing on just the keywords. Second, we present Order-Reduced Modeling methods for the cross-lingual adaptation, and find that modeling partial word orders instead of the whole sequence can improve the robustness of the model against word order differences between languages and task knowledge transfer to low-resource languages. Third, we propose to leverage different levels of domain-related corpora and additional masking of data in the pre-training for the cross-domain adaptation, and discover that more challenging pre-training can better address the domain discrepancy issue in the task knowledge transfer. Finally, we introduce a coarse-to-fine framework, Coach, and a cross-lingual and cross-domain parsing framework, X2Parser. Coach decomposes the representation learning process into a coarse-grained and a fine-grained feature learning, and X2Parser simplifies the hierarchical task structures into flattened ones. We observe that simplifying task structures makes the representation learning more effective for low-resource languages and domains.

Cross-domain NER under a Divide-and-Transfer Paradigm

Three Heads Are Better Than One: Improving Cross-Domain NER with Progressive Decomposed Network

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

Semi-Supervised Disentangled Framework for Transferable Named Entity Recognition

Transfer Learning and Deep Domain Adaptation

Cross-domain Named Entity Recognition via Graph Matching

Cross-domain NER in the data-poor scenarios for human mobility knowledge

Effective Transfer Learning for Low-Resource Natural Language Understanding

Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment

UniTrans : Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data

Searching for Optimal Subword Tokenization in Cross-domain NER

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

A Teacher-Student Approach to Cross-Domain Transfer Learning with Multi-level Attention

Research on Named Entity Recognition for Spoken Language Understanding Using Adversarial Transfer Learning

Exploring and Predicting Transferability across NLP Tasks

Using Domain Knowledge for Low Resource Named Entity Recognition

Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity Recognition

A Research Toward Chinese Named Entity Recognition Based on Transfer Learning

Neural Adaptation Layers for Cross-domain Named Entity Recognition