A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication

Yubin Qin,Yang Wang,Xiaolong Yang,Zhiren Zhao,Shaojun Wei,Yang Hu,Shouyi Yin
DOI: https://doi.org/10.1109/vlsitechnologyandcir46783.2024.10631322
2024-01-01
Abstract:This paper proposes an energy-efficient diffusion model processor exploiting inter-time-step computation redundancy. It has three features: 1) a semantic-segment sparse convolution engine removes 88.5% of duplicated convolution layer (CL) computations. 2) a resemble trivial attention exponent inheritance design improves attention layer (AL) computation efficiency by 16.7×. 3) a bipolar floating-point multiplier saves 25.4% multiplication effort by avoiding ineffective mantissa multiplication for both CL and AL. It achieves a peak efficiency of 52.01TFLOPS/W, and reduces energy by 23.14× and 3.94× compared to state-of-the-art CL and AL processors.
What problem does this paper attempt to address?