Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

Zhiyuan Chen,Yufei Ma,Zhongfeng Wang
DOI: https://doi.org/10.1109/tcsi.2022.3166524
2022-07-01
Abstract:The appealing property of low area, low power, and high bit error tolerance has made Stochastic Computing (SC) a promising alternative to conventional binary arithmetic for many computation intensive tasks, e.g., convolutional neural networks (CNNs). However, current SC-based CNN accelerators suffer from the intrinsic computation error and exponentially growing latency. In this work, we optimize both the architecture of SC multiply-and-accumulate (MAC) unit and the overall acceleration strategy of CNN accelerator to favor SC. A low-complexity bit-stream-extending method is proposed to suppress the computation error of SC and ensure the trained fix-point model can be deployed into SC-based hardware without fine-tuning. Besides, distribution-determined partition scheme is developed to design hybrid stochastic-binary computing (SBC) MAC unit which boosts the processing of bit streams at a minimum overhead. For the overall accelerator, the SBC-based MAC array is extended to reuse hardware resources and improve throughput, since the judiciously chosen loop unrolling strategy can better benefit SC operations. The proposed CNN accelerator with extended SBC-MAC array is synthesized and validated using TSMC 28nm CMOS on several representative CNNs, targeted at ImageNet dataset. Compared with precise binary implementation, our proposed design gains 44% area reduction and 50% power saving but induces only 4% additional computation latency and 0.5% accuracy degradation.
What problem does this paper attempt to address?