Code-switching finetuning: Bridging multilingual pretrained language models for enhanced cross-lingual performance
Changtong Zan,Liang Ding,Li Shen,Yu Cao,Weifeng Liu
DOI: https://doi.org/10.1016/j.engappai.2024.109532
IF: 8
2024-11-10
Engineering Applications of Artificial Intelligence
Abstract:In recent years, the development of pre-trained models has significantly propelled advancements in natural language processing. However, multilingual sequence-to-sequence pretrained language models (Seq2Seq PLMs) are pretrained on a wide range of languages (e.g., 25 languages), yet often finetuned for specific bilingual tasks (e.g., English–German), leading to domain and task discrepancies between pretraining and finetuning stages, which may lead to sub-optimal downstream performance. In this study, we first illustratively reveal such domain and task discrepancies, and then conduct an in-depth investigation into the side effects that these discrepancies may have on both training dynamic and downstream performance. To alleviate those side effects, we introduce a simple and effective code-switching restoration task (namely code-switching finetuning ) into the standard pretrain-finetune pipeline. Specifically, in the first stage, we recast the downstream data as the self-supervised format used for pretraining, in which the denoising signal is the code-switched cross-lingual phrase. Then, the model is finetuned on downstream task as usual in the second stage. Experiments spanning both natural language generation (12 supervised translations, 30 zero-shot translations, and 2 cross-lingual summarization tasks) and understanding (7 cross-lingual natural language inference tasks) tasks demonstrate that our model consistently and significantly surpasses the standard finetuning strategy. Analyses show that our method introduces negligible computational cost and reduces cross-lingual representation gaps. We have made the code publicly available at: https://github.com/zanchangtong/CSR4mBART .
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary