Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

Changjiang Gao,Hongda Hu,Peng Hu,Jiajun Chen,Jixing Li,Shujian Huang
2024-04-06
Abstract:Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the unbalanced knowledge alignment ability among different languages in current multilingual large - scale models. Although these models perform excellently in English tasks, their performance in non - English tasks is relatively poor. Specifically, the paper focuses on improving the model's ability to align knowledge among different languages through multilingual pre - training and instruction tuning, especially cross - language knowledge transfer (that is, whether the knowledge learned in one language can be effectively retrieved in another language). However, it is still unclear whether and how these two methods affect the cross - language knowledge alignment mechanism within the model. To evaluate the impact of multilingual pre - training and instruction tuning on cross - language knowledge alignment, the author proposes a systematic framework CLiKA to measure the degree of cross - language knowledge alignment from three aspects: Performance, Consistency and Conductivity. The research results show that: 1. **Multilingual pre - training and instruction tuning are beneficial to cross - language knowledge alignment, but the effect is limited**: - Continuing pre - training can improve the knowledge alignment degree of the target language, but at the cost of sacrificing the performance of other languages. - Mixed pre - training can greatly improve the basic ability and knowledge performance of multiple languages and has less impact on other languages. - However, neither continuing pre - training nor mixed pre - training can significantly improve the conductivity of cross - language knowledge. 2. **The cross - language knowledge conductivity is generally low**: - Even after multilingual pre - training and instruction tuning, all the large - scale models tested still perform poorly in cross - language knowledge transfer, especially at the conductivity level. 3. **Differences in the effects of different training strategies**: - Mixed pre - training is more effective in improving the performance and consistency of multiple languages. - Continuing pre - training can improve the performance of the target language, but may damage the performance of other languages, and has limited improvement on the consistency and conductivity of cross - language knowledge. In general, this paper aims to evaluate the impact of multilingual pre - training and instruction tuning on cross - language knowledge alignment and proposes a systematic evaluation framework CLiKA, hoping to provide references for the future optimization of multilingual large - scale models.