A 28nm 1.07TFLOPS/mm<sup>2</sup> Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing

Yixiong Yang,Ruoyang Liu,Chenhan Wei,Wenxun Wang,Wenyu Sun,Jinshan Yue,Huazhong Yang,Yongpan Liu
DOI: https://doi.org/10.1109/CICC57935.2023.10121210
2023-01-01
Abstract:Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 –5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust training results, or a low-precision format to increase efficiency but fail in accuracy. Mixed precision training (MPT) is promising to achieve both high accuracy and high efficiency. Manual mixed precision [5] is usually a coarse-grained mapping (per layer), which limits training accuracy. Automatic precision search [6] provides accurate and fine-grained precision mapping, but the high search latency slowdown the overall training process.
What problem does this paper attempt to address?