Abstract:We find a surprising connection between multitask learning and robustness to neuron failures. Our experiments show that bilingual language models retain higher performance under various neuron perturbations, such as random deletions, magnitude pruning and weight noise compared to equivalent monolingual ones. We provide a theoretical justification for this robustness by mathematically analyzing linear representation learning and showing that multitasking creates more robust representations. Our analysis connects robustness to spectral properties of the learned representation and proves that multitasking leads to higher robustness for diverse task vectors. We open-source our code and models: <a class="link-external link-https" href="https://github.com/giannisdaras/multilingual_robustness" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Are multi - task learning models more robust than single - task models, especially how do they perform in the face of neuron failures or weight perturbations?** Specifically, the author explored the performance of bilingual language models under different types of neuron perturbations (such as random deletion, weight pruning, and additive Gaussian noise), and attempted to theoretically explain why multi - task learning can improve the robustness of models. ### Research Background 1. **Cognitive Reserve and Bilingualism** - Research in cognitive science has shown that the brains of bilinguals show greater robustness in the face of neurodegenerative diseases (such as dementia), which is called "cognitive reserve". - Bilinguals can still maintain normal cognitive functions even when the brain structure is damaged. 2. **Multi - task Learning and Model Robustness** - Inspired by the above research, the author explored whether artificial neural networks can also show similar robustness under multi - task training. - Specific questions include: Can multi - task learning improve the model's resistance to neuron failures? If so, what is the theoretical mechanism behind it? ### Main Contributions 1. **Experimental Verification** - The author found through experiments that bilingual language models degrade more slowly in performance when facing various neuron perturbations, and even outperform monolingual models under high - noise conditions. - This phenomenon is not only present in the GPT - 2 model, but also has been verified on datasets such as MNIST, CIFAR10, and Newsgroup20. 2. **Theoretical Analysis** - The author proved through mathematical analysis that multi - task learning can create more robust representations, especially when the task vectors are randomly independent Gaussian vectors. - They introduced the theoretical framework of linear representation learning and explained how multi - task learning leads to higher robustness through singular value decomposition (SVD). - The key conclusion is: As the number of tasks increases, the average mean - squared error (MSE) of the model for additive noise will decrease, that is, the model becomes more robust. 3. **Practical Applications** - The author showed that multi - task learning not only improves the robustness of the model, but also plays a role of regularization, similar to \( \ell_2 \) regularization. - Experimental results show that multi - task learning can significantly improve the robustness of the model in text generation and classification tasks. ### Summary The core problem of this paper is to explore whether multi - task learning (especially bilingual or multilingual models) can improve the model's robustness to neuron failures, and provides a positive answer through experimental and theoretical analysis. The author not only verified this phenomenon, but also deeply explored the underlying mathematical principles, revealing how multi - task learning improves the model's noise - resistance ability by creating more robust representations.

Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation

Parallel Learning by Multitasking Neural Networks

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations

Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Multitask learning of a biophysically-detailed neuron model

Overcoming catastrophic forgetting in neural networks

Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer

Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets

Robust Computation with Intrinsic Heterogeneity

What Is Missing in Multilingual Visual Reasoning and How to Fix It

Exploring mechanisms of Neural Robustness: probing the bridge between geometry and spectrum

Multi-lingual agents through multi-headed neural networks

Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?

Lost in Translation: The Algorithmic Gap Between LMs and the Brain

One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

Synergistic pathways of modulation enable robust task packing within neural dynamics

Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

Multiplex networks quantify robustness of the mental lexicon to catastrophic concept failures, aphasic degradation and ageing

Resilience from Diversity: Population-based approach to harden models against adversarial attacks