Abstract:Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7% reduction in harm across all languages on top of the data mixtures method using the same available data. Overall, our comprehensive study of merging approaches provides a useful framework for building strong and safe multilingual models.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to optimize the safety and general performance of large - language models (LLMs) in a multilingual environment, especially when dealing with diverse tasks. Specifically, the authors explored whether the method of model merging can balance safety and overall performance more effectively than the traditional data - mixing method. ### Problem Background 1. **Safety and Multilingual Challenges** - Large - language models are adopted in a wide range of applications, but ensuring their safe use remains a significant challenge. - Existing preference training and safety measures often over - fit to Western - centric datasets, and these safety protocols usually cannot be extended to multilingual environments. - Each language presents unique learning challenges in different tasks, so an effective method is required to handle these challenges. 2. **Limitations of Traditional Methods** - Traditional data - mixing methods have difficulty ensuring that all tasks can benefit from the shared training process in multi - task training, especially in terms of safety, and the overall performance of the model is often affected. ### Core Problems of the Paper The paper mainly explored the following two core problems: 1. **Model Merging vs. Data Mixing** - The authors studied whether, in a multilingual environment, the method of model merging can balance safety and general performance more effectively than the traditional data - mixing method. - Specifically, they compared the effects of different merging algorithms and evaluated the performance of these methods in a multilingual environment. 2. **Multilingual Alignment** - In a multilingual environment, how to effectively handle the unique structures, cultural differences, and potential biases of each language to build robust and safe multilingual models. ### Main Findings 1. **Model Merging Is Superior to Data Mixing** - The authors found that objective - based merging is more effective than data mixing, improving general performance and safety by 8% and 10% respectively. - In particular, the SLERP method performs best in balancing safety and general performance, being able to achieve a further 3.1% reduction in harm and a 7.0% improvement in general performance. 2. **Effectiveness of Multilingual Models** - By merging models after monolingual fine - tuning, the authors achieved a 4% improvement in general performance and a 7% reduction in harm. - This indicates that language - based merging is an effective strategy for integrating diverse languages without sacrificing the performance of key indicators. 3. **Differences in the Performance of Different Merging Algorithms** - Different merging algorithms have different effects in balancing safety and general performance. For example, the TIES method performs well in reducing harmful generation but has an impact on general performance; while SLERP achieves the best balance between the two. ### Conclusion Through comprehensive research, the authors have demonstrated that the model - merging method can more effectively balance safety and general performance in a multilingual environment, especially when dealing with diverse tasks. This finding provides a useful framework for building powerful and safe multilingual models.

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Unlocking the Potential of Model Merging for Low-Resource Languages

HM3: Heterogeneous Multi-Class Model Merging

Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

BiMix: Bivariate Data Mixing Law for Language Model Pretraining

Unconstrained Model Merging for Enhanced LLM Reasoning

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Efficient Online Data Mixing For Language Model Pre-Training

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

Training-Free Pretrained Model Merging

What Matters for Model Merging at Scale?

MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic