Hardware-aware Approach to Deep Neural Network Optimization

Hengyi Li,Lin Meng
DOI: https://doi.org/10.1016/j.neucom.2023.126808
IF: 6
2023-01-01
Neurocomputing
Abstract:Deep neural networks (DNNs) have been a pivotal technology in a myriad of fields, boasting remarkable achievements. Nevertheless, their substantial workload and inherent redundancies pose ongoing challenges for both practitioners and academia. While numerous researchers endeavor to optimize DNNs, the inherent parallelism features of hardware are generally underutilized, resulting in inefficient use of hardware resources. To address this deficit, the paper unveils a hardware-aware mechanism, IHSOpti, which incorporates hardware characteristics with software algorithms for DNN optimization. IHSOpti endeavors to exploit the full potential of modern hardware parallelism, with significant emphasis on pipelining mechanisms. Specifically, IHSOpti formulates an advanced sparse training algorithm Polar_HSPG which incorporates the newly-proposed layer-wise refined polarization regularizer (LWPolar), grounded on the half-space project gradient (HSPG). Subsequently, IHSOpti pioneeringly introduces the residual strategy for optimizing the layer-level redundancies of neural networks, capitalizing on the pipelining attributes inherent in current hardware. Experimental findings demonstrate that IHSOpti attains outstanding pruning ratios in both parameters and FLOPs. Specifically, IHSOpti achieves up to 96.90% and 82.73% pruning ratios with the accuracy of 93.34% for VGGBN, 97.69% and 95.24% pruning ratios with the accuracy of 94.69% for ResNet, 98.07% and 97.80% pruning ratios with the accuracy of 95.73% for the cutting-edge network RegNet, respectively. Notably, the running efficiency exhibits remarkable improvements with accelerations ranging from 3.63× to 8.20× for CPUs and 1.22× to 2.25× for GPUs, respectively. These outcomes surpass the latest advances in the field. Through the incorporation of specific hardware characteristics, IHSOpti provides a comprehensive and effective approach to harness the intrinsic parallelism of contemporary hardware platforms for DNNs.
What problem does this paper attempt to address?