Abstract:Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.

On Learning Universal Representations Across Languages.

Hierarchical and Bidirectional Joint Multi-Task Classifiers for Natural Language Understanding

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

HC$^2$L: Hybrid and Cooperative Contrastive Learning for Cross-lingual Spoken Language Understanding

mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models?

HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings

FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding.

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Machine-Created Universal Language for Cross-lingual Transfer

Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment

Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

How Do Multilingual Encoders Learn Cross-lingual Representation?

Leveraging Multi-lingual Positive Instances in Contrastive Learning to Improve Sentence Embedding

Unsupervised Cross-Lingual Sentence Representation Learning via Linguistic Isomorphism

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding.

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

Zero-Resource Multilingual Model Transfer: Learning What to Share