Abstract:In recent years, the development of pre-trained models has significantly propelled advancements in natural language processing. However, multilingual sequence-to-sequence pretrained language models (Seq2Seq PLMs) are pretrained on a wide range of languages (e.g., 25 languages), yet often finetuned for specific bilingual tasks (e.g., English–German), leading to domain and task discrepancies between pretraining and finetuning stages, which may lead to sub-optimal downstream performance. In this study, we first illustratively reveal such domain and task discrepancies, and then conduct an in-depth investigation into the side effects that these discrepancies may have on both training dynamic and downstream performance. To alleviate those side effects, we introduce a simple and effective code-switching restoration task (namely code-switching finetuning ) into the standard pretrain-finetune pipeline. Specifically, in the first stage, we recast the downstream data as the self-supervised format used for pretraining, in which the denoising signal is the code-switched cross-lingual phrase. Then, the model is finetuned on downstream task as usual in the second stage. Experiments spanning both natural language generation (12 supervised translations, 30 zero-shot translations, and 2 cross-lingual summarization tasks) and understanding (7 cross-lingual natural language inference tasks) tasks demonstrate that our model consistently and significantly surpasses the standard finetuning strategy. Analyses show that our method introduces negligible computational cost and reduces cross-lingual representation gaps. We have made the code publicly available at: https://github.com/zanchangtong/CSR4mBART .

Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment

Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling Via Adjustive and Forced Cross-Task Alignment

Mix Before Align: Towards Zero-shot Cross-lingual Sentiment Analysis Via Soft-Mix and Multi-View Learning

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Reducing Multilingual Context Confusion for End-to-end Code-switching Automatic Speech Recognition

Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language

Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection

Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Improving Cross-lingual Representation for Semantic Retrieval with Code-switching

Enhancing Code-switching Speech Recognition with Interactive Language Biases

HC$^2$L: Hybrid and Cooperative Contrastive Learning for Cross-lingual Spoken Language Understanding

Code-switching finetuning: Bridging multilingual pretrained language models for enhanced cross-lingual performance

PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning