Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

Chunjie Luo,Xiwen He,Jianfeng Zhan,Lei Wang,Wanling Gao,Jiahui Dai
DOI: https://doi.org/10.48550/arXiv.2005.05085
2020-05-07
Abstract:Due to increasing amounts of data and compute resources, deep learning achieves many successes in various domains. The application of deep learning on the mobile and embedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices becomes an urgent problem to be solved. Considering the model diversity and framework diversity, we propose a benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices. AIoTBench covers three typical heavy-weight networks: ResNet50, InceptionV3, DenseNet121, as well as three light-weight networks: SqueezeNet, MobileNetV2, MnasNet. Each network is implemented by three frameworks which are designed for mobile and embedded devices: Tensorflow Lite, Caffe2, Pytorch Mobile. To compare and rank the AI capabilities of the devices, we propose two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). Currently, we have compared and ranked 5 mobile devices using our benchmark. This list will be extended and updated soon after.
Machine Learning,Artificial Intelligence,Performance,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: With the successful application of deep learning in various fields, especially the increasing attention to its application on mobile and embedded devices, how to effectively benchmark and rank the AI inference capabilities of these devices has become an urgent problem to be solved. Specifically, the paper proposes solutions to the following problems: 1. **Model diversity**: Different neural network architectures have different trade - offs between accuracy and computational complexity, and no single network architecture can unify all design and application scenarios. Therefore, a benchmarking suite covering a variety of typical and lightweight network architectures is required to comprehensively evaluate the performance of different devices. 2. **Framework diversity**: There are currently many popular deep - learning frameworks (such as TensorFlow Lite, Caffe2, PyTorch Mobile, etc.), which provide different implementation methods and support levels on mobile and embedded devices. Therefore, a benchmarking tool that can compare the performance of different frameworks is required. 3. **Hardware acceleration support**: Modern mobile and embedded devices are usually equipped with hardware accelerators such as GPUs or NPUs to support AI applications. However, different devices have different levels of support for these accelerators, and a benchmarking method that can reflect this difference is required. To solve the above problems, the authors propose a benchmarking suite named AIoTBench, which focuses on evaluating the inference capabilities of mobile and embedded devices. AIoTBench covers three typical heavy - duty networks (ResNet50, InceptionV3, DenseNet121) and three lightweight networks (SqueezeNet, MobileNetV2, MnasNet), and each network is implemented in three frameworks specifically designed for mobile and embedded devices. In addition, in order to compare and rank the AI capabilities of different devices, the authors propose two unified metrics as AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). These two metrics reflect the trade - off between quality and performance in AI systems. In summary, this paper aims to help users better understand and evaluate the AI inference capabilities of mobile and embedded devices by providing a comprehensive and easy - to - use benchmarking tool.