A 10TFLOPS Datacenter-Oriented GPU with 4-Corner Stacked 64GB Memory by the Means of 2.5D Packaging Technology

Shuang Wang,Weiliang Chen,Xueqing Li,Leibo Liu,Huazhong Yang
DOI: https://doi.org/10.1109/a-sscc58667.2023.10347909
2023-01-01
Abstract:GPU has a large number of parallel computing units and fast performance growth, which is suitable for datacenter[1]. In order to improve GPU performance, many schemes have been proposed such as adding computing core, increasing frequency and memory capacity [2] [3] . But as Moore's Law slows down, GPU becomes larger and larger. The ultra large chip is very expensive to be manufactured because the yield is too low. This paper demonstrates that MCM(Multi-Chip Module) design and 2.5D packaging technology could be applied to address the issue of large chip area. Compared to the traditional monolithic GPU, 4-corner memory-stacked GPU by the means of 2.5D packaging technology(Die on Wafer on Substrate with Interposer) is adopted. New scheme also uses HBM2 based on TSV(Through Silicon Via) technology with area benefits. Secondly, 3+1 levels(DIE-Interposer-PKG-PCB) circuit design and 4 mark areas alignment methods are used to solve the SI(Signal Integrity) problem. And packaging IR-Drop is optimized by cell flipping and special LVT(Low Voltage Threshold) cell selection. A data center GPU is designed with performance exceeding 10TFLOPS (FP32) and 64GB memory. The result illustrates these technologies can reduce area and increase yield of datacenter GPU effectively.
What problem does this paper attempt to address?