Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines

Siting Liu,Honglan Jiang,Leibo Liu,Jie Han
DOI: https://doi.org/10.1109/tcad.2018.2858363
IF: 2.9
2018-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Gradient descent (GD) is a widely used optimization algorithm in machine learning. In this paper, a novel stochastic computing GD circuit (SC-GDC) is proposed by encoding the gradient information in stochastic sequences. Inspired by the structure of a neuron, a stochastic integrator is used to optimize the weights in a learning machine by its "inhibitory" and "excitatory" inputs. Specifically, two AND (or XNOR) gates for the unipolar representation (or the bipolar representation) and one stochastic integrator are, respectively, used to implement the multiplications and accumulations in a GD algorithm. Thus, the SC-GDC is very area- and power-efficient. As per the formulation of the proposed SC-GDC, it provides unbiased estimate of the optimized weights in a learning algorithm. The proposed SC-GDC is then used to implement a least-mean-square algorithm and a softmax regression. With a similar accuracy, the proposed design achieves more than 30x improvement in throughput per area (TPA) and consumes less than 13% of the energy per training sample, compared with a fixed-point implementation. Moreover, a signed SC-GDC is proposed for training complex neural networks (NNs). It is shown that for a 784-128-128-10 fully connected NN, the signed SC-GDC produces a similar training result with its fixed-point counterpart, while achieving more than 90% energy saving and 82% reduction in training time with more than 50x improvement in TPA.
What problem does this paper attempt to address?