SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis

Kunming Shao,Fengshi Tian,Xiaomeng Wang,Jiakun Zheng,Jia Chen,Jingyu He,Hui Wu,Jinbo Chen,Xihao Guan,Yi Deng,Fengbin Tu,Jie Yang,Mohamad Sawan,Tim Kwang-Ting Cheng,Chi-Ying Tsui
2024-11-25
Abstract:Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. However, the need for customized memory cells and logic components currently necessitates significant manual effort in DCIM design. Existing tools for facilitating DCIM macro designs struggle to optimize subcircuit synthesis to meet user-defined performance criteria, thereby limiting the potential system-level acceleration that DCIM can offer. To address these challenges and enable agile design of DCIM macros with optimal architectures, we present SynDCIM, a performance-aware DCIM compiler that employs multi-spec-oriented subcircuit synthesis. SynDCIM features an automated performance-to-layout generation process that aligns with user-defined performance expectations. This is supported by a scalable subcircuit library and a multi-spec-oriented searching algorithm for effective subcircuit synthesis. The effectiveness of SynDCIM is demonstrated through extensive experiments and validated with a test chip fabricated in a 40nm CMOS process. Testing results reveal that designs generated by SynDCIM exhibit competitive performance when compared to state-of-the-art manually designed DCIM macros.
Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the large amount of manual design work in current digital computing - in - memory (DCIM) designs and the inability of existing tools to effectively optimize sub - circuit synthesis to meet user - defined performance criteria. Specifically, existing DCIM macro - design tools have difficulty optimizing sub - circuit synthesis, resulting in limited system - level acceleration potential and being unable to adapt to the diverse AI application scenarios and the integration requirements of modern digital very - large - scale - integration (VLSI) workflows. To address these challenges, the paper proposes SynDCIM - a performance - aware DCIM compiler that adopts a multi - specification - oriented sub - circuit synthesis method. The main goal of SynDCIM is to achieve flexible data precision, scalable array parameters, and multi - specification - oriented optimization by automating the performance - to - layout generation process and aligning with user - defined performance expectations, ultimately generating an optimal DCIM macro - design. ### Main problem summary: 1. **Large amount of manual design work**: Existing DCIM designs require a great deal of customization work, including the design of read - out peripherals to support bit - configurable integer and floating - point precision. 2. **Lack of comprehensive design automation tools**: Existing tools cannot effectively optimize sub - circuit synthesis, resulting in DCIM macro - designs not meeting performance expectations. 3. **Insufficient performance optimization**: Different AI applications (such as vision, language processing, robotics) and acceleration scenarios (such as wearable devices, mobile platforms, cloud computing) require different performance optimizations, and existing tools cannot fully meet these requirements. ### Goals of SynDCIM: - **Automated performance - to - layout generation**: Automatically generate the optimal DCIM architecture and complete layout according to user - defined performance metrics. - **Flexible data precision and scalable array parameters**: Support multiple data formats and array sizes. - **Multi - specification - oriented optimization**: Ensure that the generated DCIM macros are optimal in terms of energy efficiency, throughput, and area efficiency. Through these improvements, SynDCIM aims to enhance the flexibility and efficiency of DCIM designs, enabling them to better adapt to the requirements of various AI applications and simplify their integration in modern VLSI workflows.