A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation

Yang Wang,Yubin Qin,Dazheng Deng,Jingchuan Wei,Tianbao Chen,Xinhan Lin,Leibo Liu,Shaojun Wei,Shouyi Yin
DOI: https://doi.org/10.23919/VLSICircuits52068.2021.9492420
2021-01-01
Abstract:A dynamic weight pruning (DWP) explored processor, named Trainer, is proposed for energy-efficient deep-neural-network (DNN) training on edge-device. It has three key features: 1) A implicit redundancy speculation unit (IRSU) improves 1.46× throughput. 2) A dataflow, allowing a reuse-adaptive dynamic compression and PE regrouping, increases 1.52× utilization. 3) A data-retrieval eliminated batch-normalization (BN) unit (REBU) saves 37.1% of energy. Trainer achieves a peak energy efficiency of 276.55TFLOPS/W. It reduces 2.23× training energy and offers a 1.76× training speedup compared with the state-of-the-art sparse DNN training processor.
What problem does this paper attempt to address?