Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Quzhe Huang,Zhenwei An,Nan Zhuang,Mingxu Tao,Chen Zhang,Yang Jin,Kun Xu,Liwei Chen,Songfang Huang,Yansong Feng
DOI: https://doi.org/10.18653/v1/2024.acl-long.696
2024-01-01
Abstract:In this paper, we introduce a novel dynamic expert selection framework forMixture of Experts (MoE) models, aiming to enhance computational efficiency andmodel performance by adjusting the number of activated experts based on inputdifficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing,which activates a predetermined number of experts regardless of the input'scomplexity, our method dynamically selects experts based on the confidencelevel in expert selection for each input. This allows for a more efficientutilization of computational resources, activating more experts for complextasks requiring advanced reasoning and fewer for simpler tasks. Throughextensive evaluations, our dynamic routing method demonstrates substantialimprovements over conventional Top-2 routing across various benchmarks,achieving an average improvement of 0.7parameters. Further analysis shows our model dispatches more experts to tasksrequiring complex reasoning skills, like BBH, confirming its ability todynamically allocate computational resources in alignment with the input'scomplexity. Our findings also highlight a variation in the number of expertsneeded across different layers of the transformer model, offering insights intothe potential for designing heterogeneous MoE frameworks. The code and modelsare available at https://github.com/ZhenweiAn/Dynamic_MoE.