MobileNetV4 -- Universal Models for the Mobile Ecosystem

Danfeng Qin,Chas Leichner,Manolis Delakis,Marco Fornoni,Shixin Luo,Fan Yang,Weijun Wang,Colby Banbury,Chengxi Ye,Berkin Akin,Vaibhav Aggarwal,Tenghui Zhu,Daniele Moro,Andrew Howard
2024-09-30
Abstract:We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve efficient and high - precision neural network model design on mobile devices. Specifically, the paper proposes a new neural network architecture - MobileNetV4 (MNv4), aiming to balance the accuracy and efficiency of the model to adapt to different types of mobile hardware, such as CPU, DSP, GPU and accelerators (e.g., Apple Neural Engine and Google Pixel EdgeTPU). The main contributions of the paper include: 1. **Universal Inverted Bottleneck (UIB)**: This is a flexible structure that combines Inverted Bottleneck (IB), ConvNext, FeedForward Network (FFN) and a new Extra Depthwise (ExtraDW) variant. The UIB block provides flexibility in spatial and channel mixing through an optional depth - convolution layer, and can expand the receptive field and improve computational efficiency. 2. **Mobile Multi - Query Attention (Mobile MQA)**: This is an attention mechanism optimized for mobile accelerators, achieving more than 39% improvement in inference speed compared to Multi - Head Attention (MHSA). Mobile MQA improves operational intensity by sharing keys and values to reduce memory bandwidth requirements. 3. **Optimized Neural Architecture Search (NAS)**: The paper introduces a two - stage NAS method, performing coarse - grained and fine - grained searches respectively, which significantly improves the search efficiency, enabling the MNv4 model to be larger and more effective than previous state - of - the - art models. 4. **Performance Modeling and Analysis**: The paper explains how MNv4 achieves high performance on different hardware platforms through performance modeling and analysis techniques. These techniques help to understand the performance of the model on different hardware and guide the model design. 5. **Distillation Technique**: To further improve the accuracy of the model, the paper introduces a new distillation technique. By mixing datasets with different data augmentations and adding balanced similar - type data, the generalization ability of the model is enhanced and the accuracy is improved. Through these innovations, the MNv4 model achieves mostly Pareto - optimal performance on multiple hardware platforms, especially achieving a good balance between accuracy and efficiency. For example, the MNv4 - Hybrid - L model achieves a Top - 1 accuracy of 87% on the ImageNet - 1K dataset, while the running time on the Pixel 8 EdgeTPU is 3.8 milliseconds.