Abstract:Natural language understanding (NLU) is the task of semantic decoding of human languages by machines. NLU models rely heavily on large training data to ensure good performance. However, substantial languages and domains have very few data resources and domain experts. It is necessary to overcome the data scarcity challenge, when very few or even zero training samples are available. In this thesis, we focus on developing cross-lingual and cross-domain methods to tackle the low-resource issues. First, we propose to improve the model's cross-lingual ability by focusing on the task-related keywords, enhancing the model's robustness and regularizing the representations. We find that the representations for low-resource languages can be easily and greatly improved by focusing on just the keywords. Second, we present Order-Reduced Modeling methods for the cross-lingual adaptation, and find that modeling partial word orders instead of the whole sequence can improve the robustness of the model against word order differences between languages and task knowledge transfer to low-resource languages. Third, we propose to leverage different levels of domain-related corpora and additional masking of data in the pre-training for the cross-domain adaptation, and discover that more challenging pre-training can better address the domain discrepancy issue in the task knowledge transfer. Finally, we introduce a coarse-to-fine framework, Coach, and a cross-lingual and cross-domain parsing framework, X2Parser. Coach decomposes the representation learning process into a coarse-grained and a fine-grained feature learning, and X2Parser simplifies the hierarchical task structures into flattened ones. We observe that simplifying task structures makes the representation learning more effective for low-resource languages and domains.

Low-Resource Adaptation of Neural NLP Models

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Effective Transfer Learning for Low-Resource Natural Language Understanding

Building Low-Resource NER Models Using Non-Speaker Annotation

Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

A Survey on Low-Resource Neural Machine Translation

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Transfer Learning for Low-Resource Clinical Named Entity Recognition

Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios

Multi-Stage Pre-training for Low-Resource Domain Adaptation

The Importance of Context in Very Low Resource Language Modeling

Handling Syntactic Divergence in Low-resource Machine Translation

Low-Rank Adaptation for Multilingual Summarization: An Empirical Study

Cross lingual transfer learning for zero-resource domain adaptation

Low-resource Languages: A Review of Past Work and Future Challenges

Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields

Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing