Special Topic on Nonvolatile Memory for Efficient Implementation of Neural/Neuromorphic Computing

Shimeng Yu
DOI: https://doi.org/10.1109/jxcdc.2019.2913526
2019-06-01
Abstract:In recent years, artificial intelligence (AI) based on machine/deep learning has shown significantly improved accuracy in large-scale visual/auditory recognition and classification tasks, some even surpassing human-level accuracy. In particular, deep neural networks (DNNs) and their variants have proved their efficacy in a wide range of image, video, speech, and biomedical applications. To achieve incremental accuracy improvement, state-of-the-art deep learning algorithms tend to aggressively increase the depth and size of the network, which imposes ever-increasing computational capacity and storage cost in hardware. Although graphic processing units (GPUs) are the dominant technology in the training of the DNN models at the cloud, application-specific integrated circuit (ASIC) hardware accelerators are being developed to run large-scale deep learning algorithms for inference (or even training) on-chip. This provides opportunities to bring the AI closer to the edge device for applications such as autonomous driving, machine translation, and smart wearable devices, where severe constraints exist in performance, power, and area.
What problem does this paper attempt to address?