Abstract:Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at <a href="https://code.ihub.org.cn/projects/915/repository/PhoneBit">https://code.ihub.org.cn/projects/915/repository/PhoneBit</a>.

MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

On-Device Neural Net Inference with Mobile GPUs

Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs.

MNN: A Universal and Efficient Inference Engine

An efficient GPU-accelerated inference engine for binary neural network on mobile phones

Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones

Optimizing the Learning Performance in Mobile Augmented Reality Systems with CNN

ReMoNet: Recurrent Multi-Output Network for Efficient Video Denoising.

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Pre-DNNOff: On-Demand DNN Model Offloading Method for Mobile Edge Computing

MobiFace: A Lightweight Deep Learning Face Recognition on Mobile Devices

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

GhostNets on Heterogeneous Devices via Cheap Operations

Profiling and optimizing deep learning inference on mobile GPUs.

Distributed Convolutional Neural Network Training on Mobile and Edge Clusters

Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation

Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management

MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions