Abstract:Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.

Does Typological Blinding Impede Cross-Lingual Sharing?

SIGTYP 2020 Shared Task: Prediction of Typological Features

Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures

Reconstructing Native Language Typology from Foreign Language Usage

The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments

The Past, Present, and Future of Typological Databases in NLP

Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

Cross-Lingual Transfer of Cognitive Processing Complexity

Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation

Probing Language Identity Encoded in Pre-Trained Multilingual Models: a Typological View.

Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

Why do language models perform worse for morphologically complex languages?

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

Isomorphic Cross-lingual Embeddings for Low-Resource Languages

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

Language Embeddings Sometimes Contain Typological Generalizations

An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models

Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages