H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

Yandong Luo,Shimeng Yu
DOI: https://doi.org/10.1145/3649219
IF: 1.447
2024-02-28
ACM Transactions on Design Automation of Electronic Systems
Abstract:Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?