MAP: Model Merging with Amortized Pareto Front Using Limited Computation

Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio
Abstract:Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. However, existing methods focus on enhancing average task accuracy, often neglecting the trade-offs between different tasks. We introduce Model Merging with Amortized Pareto Front (MAP), a novel low-compute algorithm that efficiently identifies a Pareto set of scaling coefficients for merging multiple models. MAP uses a quadratic approximation surrogate model to estimate task metrics, enabling amortized inference. Our approach is particularly valuable in federated learning scenarios, where it can balance performance across diverse client datasets while respecting privacy constraints and minimizing communication overhead. Experimental results on vision and natural language processing tasks demonstrate MAP's ability to accurately identify the Pareto front, offering practitioners a range of …
What problem does this paper attempt to address?