Assessment of inference accuracy and memory capacity of computation-in-memory enabled neural network due to quantized weights, gradients, input and output signals, and memory non-idealities

Adil Padiyal,Ayumu Yamada,Naoko Misawa,Chihiro Matsui,Ken Takeuchi
DOI: https://doi.org/10.35848/1347-4065/ad2e45
IF: 1.5
2024-02-28
Japanese Journal of Applied Physics
Abstract:Abstract This paper proposes an approach to enhance the efficiency of computation-in-memory enabled neural networks. The proposed methods involve partial quantization of learning and inference processes within the neural network to increase the training and inference speed while reducing energy and memory consumption. The impact of quantization due to the usage of computation-in-memory is evaluated based on inference accuracy. The effect of non-idealities incurred due to the employment of different memories such as ReRAM on the network accuracy is documented and reported. The results indicate that a certain quantization bit precision threshold is necessary for weights, input/output data, and gradients to maintain an acceptable inference accuracy level. Notably, the experiments demonstrate a modest degradation of approximately 2.8% in inference accuracy compared to the neural network trained without using computation-in-memory, this accuracy trade-off is accompanied by a substantial memory capacity improvement, with best-case memory usage reductions of 62% and 93% during the training and inference phase respectively.
physics, applied
What problem does this paper attempt to address?