Abstract:With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context of developing highly-parallel and specific DCNN hardware, utilizing GPGPUs, FPGAs, and ASICs. Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power/energy and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources bring about immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents the first comprehensive design and optimization framework of SC-based DCNNs (SC-DCNNs). We first present the optimal designs of function blocks that perform the basic operations, i.e., inner product, pooling, and activation function. Then we propose the optimal design of four types of combinations of basic function blocks, named feature extraction blocks, which are in charge of extracting features from input feature maps. Besides, weight storage methods are investigated to reduce the area and power/energy consumption for storing weights. Finally, the whole SC-DCNN implementation is optimized, with feature extraction blocks carefully selected, to minimize area and power/energy consumption while maintaining a high network accuracy level.

Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

An Energy-Efficient Mixed-Signal Parallel Multiply-Accumulate (MAC) Engine Based on Stochastic Computing

Reconfigurable Spatial-Parallel Stochastic Computing for Accelerating Sparse Convolutional Neural Networks

Parallel Convolutional Neural Network (CNN) Accelerators Based on Stochastic Computing

Stochastic Computing Hardware Design and Optimization for Convolutional Neutral Networks

Accurate yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion

Not Your Father’s Stochastic Computing (SC)! Efficient Yet Accurate End-to-End SC Accelerator Design

Stochastic-Binary Hybrid Spatial Coding Multiplier for Convolutional Neural Network Accelerator

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

Stochastic Computing Convolution Neural Network Architecture Reinvented For Highly Efficient Artificial Intelligence Workload on Field Programmable Gate Array

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Memory System Designed for Multiply-Accumulate (MAC) Engine Based on Stochastic Computing.

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

Parallel Hybrid Stochastic-Binary-Based Neural Network Accelerators

Efficient yet Accurate End-to-End SC Accelerator Design

Efficient Non-Linear Adder for Stochastic Computing with Approximate Spatial-Temporal Sorting Network

Towards Budget-Driven Hardware Optimization for Deep Convolutional Neural Networks Using Stochastic Computing

Efficient Parallel Stochastic Computing Multiply-Accumulate (MAC) Technique Using Pseudo-Sobol Bit-Streams

ALSCA: A Large-Scale Sparse CNN Accelerator Using Position-First Dataflow and Input Channel Merging Approach