Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Aakanksha,Arash Ahmadian,Seraphina Goldfarb-Tarrant,Beyza Ermis,Marzieh Fadaee,Sara Hooker
2024-10-15
Abstract:Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7% reduction in harm across all languages on top of the data mixtures method using the same available data. Overall, our comprehensive study of merging approaches provides a useful framework for building strong and safe multilingual models.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to optimize the safety and general performance of large - language models (LLMs) in a multilingual environment, especially when dealing with diverse tasks. Specifically, the authors explored whether the method of model merging can balance safety and overall performance more effectively than the traditional data - mixing method. ### Problem Background 1. **Safety and Multilingual Challenges** - Large - language models are adopted in a wide range of applications, but ensuring their safe use remains a significant challenge. - Existing preference training and safety measures often over - fit to Western - centric datasets, and these safety protocols usually cannot be extended to multilingual environments. - Each language presents unique learning challenges in different tasks, so an effective method is required to handle these challenges. 2. **Limitations of Traditional Methods** - Traditional data - mixing methods have difficulty ensuring that all tasks can benefit from the shared training process in multi - task training, especially in terms of safety, and the overall performance of the model is often affected. ### Core Problems of the Paper The paper mainly explored the following two core problems: 1. **Model Merging vs. Data Mixing** - The authors studied whether, in a multilingual environment, the method of model merging can balance safety and general performance more effectively than the traditional data - mixing method. - Specifically, they compared the effects of different merging algorithms and evaluated the performance of these methods in a multilingual environment. 2. **Multilingual Alignment** - In a multilingual environment, how to effectively handle the unique structures, cultural differences, and potential biases of each language to build robust and safe multilingual models. ### Main Findings 1. **Model Merging Is Superior to Data Mixing** - The authors found that objective - based merging is more effective than data mixing, improving general performance and safety by 8% and 10% respectively. - In particular, the SLERP method performs best in balancing safety and general performance, being able to achieve a further 3.1% reduction in harm and a 7.0% improvement in general performance. 2. **Effectiveness of Multilingual Models** - By merging models after monolingual fine - tuning, the authors achieved a 4% improvement in general performance and a 7% reduction in harm. - This indicates that language - based merging is an effective strategy for integrating diverse languages without sacrificing the performance of key indicators. 3. **Differences in the Performance of Different Merging Algorithms** - Different merging algorithms have different effects in balancing safety and general performance. For example, the TIES method performs well in reducing harmful generation but has an impact on general performance; while SLERP achieves the best balance between the two. ### Conclusion Through comprehensive research, the authors have demonstrated that the model - merging method can more effectively balance safety and general performance in a multilingual environment, especially when dealing with diverse tasks. This finding provides a useful framework for building powerful and safe multilingual models.