CMN: a co-designed neural architecture search for efficient computing-in-memory-based mixture-of-experts
Shihao Han,Sishuo Liu,Shucheng Du,Mingzi Li,Zijian Ye,Xiaoxin Xu,Yi Li,Zhongrui Wang,Dashan Shang
DOI: https://doi.org/10.1007/s11432-024-4144-y
2024-09-27
Science China Information Sciences
Abstract:Artificial intelligence (AI) has experienced substantial advancements recently, notably with the advent of large-scale language models (LLMs) employing mixture-of-experts (MoE) techniques, exhibiting human-like cognitive skills. As a promising hardware solution for edge MoE implementations, the computing-in-memory (CIM) architecture collocates memory and computing within a single device, significantly reducing the data movement and the associated energy consumption. However, due to diverse edge application scenarios and constraints, determining the optimal network structures for MoE, such as the expert's location, quantity, and dimension on CIM systems remains elusive. To this end, we introduce a software-hardware co-designed neural architecture search (NAS) framework, C IM-based M oE N AS (CMN), focusing on identifying a high-performing MoE structure under specific hardware constraints. The results of the NYUD-v2 dataset segmentation on the RRAM (SRAM) CIM system reveal that CMN can discover optimized MoE configurations under energy, latency, and performance constraints, achieving 29.67 × ( 43.10 ×) energy savings, 175.44 ×( 109.89 ×) speedup, and 12.24 × smaller model size compared to the baseline MoE-enabled Visual Transformer, respectively. This co-design opens up an avenue toward high-performance MoE deployments in edge CIM systems.
computer science, information systems,engineering, electrical & electronic