ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU

Rui Kong,Yuanchun Li,Yizhen Yuan,Linghe Kong
DOI: https://doi.org/10.1145/3581791.3596831
2023-01-01
Abstract:Many activation values of Convolutional Neural Networks (CNNs) are zeros due to ReLU (Rectified Linear Unit), one of the most common activation functions used in modern neural networks. Since ReLU outputs are zero for all negative inputs, existing CNN acceleration approaches estimate zero outputs to skip redundant computation, which has to sacrifice accuracy for efficiency and leads to dilemma trade-offs and cockamamie configuration. In this paper, we introduce a lossless acceleration method ConvReLU++ for CNN inference on mobile devices, which accurately detects and skips zero-outputs for speedup without failures. The key to early negative detection is adopting reference-based upper-bounds calculation. This ensures that as soon as the intermediate results become negative, the final results are guaranteed to be negative. Upon detection, the remaining computation can be skipped and the following ReLU output can be simply set to zero. We rigorously prove the losslessness property of ConvReLU++, analyze the theoretical FLOPs reduction, and show the compatibility of our method with vector-level parallelism on mobile platforms. We implement ConvReLU++ in popular mobile inference frameworks and evaluate it on common deep vision tasks. The results demonstrate that ConvReLU++ can achieve 2.90% to 8.91% latency reduction over the original inference framework on edge devices without sacrificing accuracy. Our code can be found at https://github.com/monster119120/conv_relu_plus_plus.
What problem does this paper attempt to address?