An efficient GPU-accelerated inference engine for binary neural network on mobile phones

Shengyu He,Haitao Meng,Zhaoheng Zhou,Yongjun Liu,Kai Huang,Gang Chen
DOI: https://doi.org/10.1016/j.sysarc.2021.102156
IF: 5.836
2021-08-01
Journal of Systems Architecture
Abstract:<p>Over the last years, deep neural networks (DNNs) are becoming more powerful and have risen in popularity, especially in mobile computing. Applications running on edge AI devices such as smartphones would potentially benefit from the new opportunities enabled by deep learning techniques. However, DNNs are by nature computationally and memory intensive, making them challenging to deploy on mobile devices. Binary neural networks (BNNs) have been considered as a promising solution that can significantly reduce the memory and computational requirements of DNNs while still offering similar capabilities of full precision DNN models. Currently, existing GPU-accelerated implementations of BNNs are <em>only</em> tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to <em>mobile devices</em> yields suboptimal performance or is impossible in some cases. Therefore, there has still been a missing piece in the literature for GPU-accelerated implementations of BNNs on mobile devices. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices. The PhoneBit open source library is available for download at <a href="https://code.ihub.org.cn/projects/915/repository/PhoneBit">https://code.ihub.org.cn/projects/915/repository/PhoneBit</a>.</p>
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?