A Fully Quantized Training Accelerator for Diffusion Network with Tensor Type & Noise Strength Aware Precision Scheduling

Ruoyang Liu,Wenxun Wang,Chen Tang,Weichen Gao,Huazhong Yang,Yongpan Liu
DOI: https://doi.org/10.1109/tcsii.2024.3439319
2024-01-01
Abstract:Fine-grained mixed-precision fully-quantized methods have great potential to accelerate neural network training, but existing methods exhibit large accuracy loss for more complex models such as diffusion networks. This brief introduces a fully-quantized training accelerator for diffusion networks. It features a novel training framework with tensor-type-and noise-strength-aware precision scheduling to optimize bit-width allocation. The processing cluster design enables dynamical switching bit-width mappings for model weights, allows concurrent processing in 4 different bit-widths, and incorporates a gradient square sum collection unit to minimize on-chip memory access. Experimental results show up to 2.4× training speedup and 81operation bit-width overhead reduction compared to existing designs, with minimal impact on image generation quality.
What problem does this paper attempt to address?