Abstract:We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve efficient and high - precision neural network model design on mobile devices. Specifically, the paper proposes a new neural network architecture - MobileNetV4 (MNv4), aiming to balance the accuracy and efficiency of the model to adapt to different types of mobile hardware, such as CPU, DSP, GPU and accelerators (e.g., Apple Neural Engine and Google Pixel EdgeTPU). The main contributions of the paper include: 1. **Universal Inverted Bottleneck (UIB)**: This is a flexible structure that combines Inverted Bottleneck (IB), ConvNext, FeedForward Network (FFN) and a new Extra Depthwise (ExtraDW) variant. The UIB block provides flexibility in spatial and channel mixing through an optional depth - convolution layer, and can expand the receptive field and improve computational efficiency. 2. **Mobile Multi - Query Attention (Mobile MQA)**: This is an attention mechanism optimized for mobile accelerators, achieving more than 39% improvement in inference speed compared to Multi - Head Attention (MHSA). Mobile MQA improves operational intensity by sharing keys and values to reduce memory bandwidth requirements. 3. **Optimized Neural Architecture Search (NAS)**: The paper introduces a two - stage NAS method, performing coarse - grained and fine - grained searches respectively, which significantly improves the search efficiency, enabling the MNv4 model to be larger and more effective than previous state - of - the - art models. 4. **Performance Modeling and Analysis**: The paper explains how MNv4 achieves high performance on different hardware platforms through performance modeling and analysis techniques. These techniques help to understand the performance of the model on different hardware and guide the model design. 5. **Distillation Technique**: To further improve the accuracy of the model, the paper introduces a new distillation technique. By mixing datasets with different data augmentations and adding balanced similar - type data, the generalization ability of the model is enhanced and the accuracy is improved. Through these innovations, the MNv4 model achieves mostly Pareto - optimal performance on multiple hardware platforms, especially achieving a good balance between accuracy and efficiency. For example, the MNv4 - Hybrid - L model achieves a Top - 1 accuracy of 87% on the ImageNet - 1K dataset, while the running time on the Pixel 8 EdgeTPU is 3.8 milliseconds.

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Searching for MobileNetV3

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

MixMobileNet: A Mixed Mobile Network for Edge Vision Applications

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Mobile Foundation Model As Firmware the Way Towards a Unified Mobile AI Landscape

MNN: A Universal and Efficient Inference Engine

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

MobileXNet: An Efficient Convolutional Neural Network for Monocular Depth Estimation

A New Image Classification Approach via Improved MobileNet Models with Local Receptive Field Expansion in Shallow Layers

RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone

MoViNets: Mobile Video Networks for Efficient Video Recognition

PROFIT: A Novel Training Method for sub-4-bit MobileNet Models

Low-res MobileNet: An efficient lightweight network for low-resolution image classification in resource-constrained scenarios

Scaling Graph Convolutions for Mobile Vision

MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?

Mobile Foundation Model as Firmware

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

MobileACNet: ACNet-Based Lightweight Model for Image Classification.