Heterogeneous Memory Architecture Accommodating Processing-in-Memory on SoC for AIoT Applications

Kangyi Qiu,Yaojun Zhang,Bonan Yan,Ru Huang
DOI: https://doi.org/10.1109/asp-dac52403.2022.9712544
2022-01-01
Abstract:Processing-In-Memory (PIM) technologies is one of most promising candidates for AIoT applications due to its attractive characteristics, such as low computation latency, large throughput and high power efficiency. However, how to efficiently utilize PIM with System-on-Chip (SoC) architecture has been scarcely discussed. In this paper, we demonstrate a series of solution from hardware architecture to algorithm to maximize the benefits of PIM design. First, we propose a Heterogeneous Memory Architecture (HMA) that facilitates the existing SoC with PIM via high-throughput on-chip buses. Then, based on given HMA structure, we also propose an HMA tensor mapping approach to partition tensors and deploy general matrix multiplication operations on PIM structures. Both HMA hardware and HMA tensor mapping approach harnesses the programmability of the mature embedded CPU solution stack and maximize the high efficiency of PIM technology. The whole HMA system can save 416 x power as well as 44.6% design area compare with the latest accelerator solutions. The evaluation also shows that our design can reduce the operation latency by 430 × and 11 × for TinyML applications, compare with state-of-art baseline and PIM without optimization, respectively.
What problem does this paper attempt to address?