Abstract:Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. However, the need for customized memory cells and logic components currently necessitates significant manual effort in DCIM design. Existing tools for facilitating DCIM macro designs struggle to optimize subcircuit synthesis to meet user-defined performance criteria, thereby limiting the potential system-level acceleration that DCIM can offer. To address these challenges and enable agile design of DCIM macros with optimal architectures, we present SynDCIM, a performance-aware DCIM compiler that employs multi-spec-oriented subcircuit synthesis. SynDCIM features an automated performance-to-layout generation process that aligns with user-defined performance expectations. This is supported by a scalable subcircuit library and a multi-spec-oriented searching algorithm for effective subcircuit synthesis. The effectiveness of SynDCIM is demonstrated through extensive experiments and validated with a test chip fabricated in a 40nm CMOS process. Testing results reveal that designs generated by SynDCIM exhibit competitive performance when compared to state-of-the-art manually designed DCIM macros.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the large amount of manual design work in current digital computing - in - memory (DCIM) designs and the inability of existing tools to effectively optimize sub - circuit synthesis to meet user - defined performance criteria. Specifically, existing DCIM macro - design tools have difficulty optimizing sub - circuit synthesis, resulting in limited system - level acceleration potential and being unable to adapt to the diverse AI application scenarios and the integration requirements of modern digital very - large - scale - integration (VLSI) workflows. To address these challenges, the paper proposes SynDCIM - a performance - aware DCIM compiler that adopts a multi - specification - oriented sub - circuit synthesis method. The main goal of SynDCIM is to achieve flexible data precision, scalable array parameters, and multi - specification - oriented optimization by automating the performance - to - layout generation process and aligning with user - defined performance expectations, ultimately generating an optimal DCIM macro - design. ### Main problem summary: 1. **Large amount of manual design work**: Existing DCIM designs require a great deal of customization work, including the design of read - out peripherals to support bit - configurable integer and floating - point precision. 2. **Lack of comprehensive design automation tools**: Existing tools cannot effectively optimize sub - circuit synthesis, resulting in DCIM macro - designs not meeting performance expectations. 3. **Insufficient performance optimization**: Different AI applications (such as vision, language processing, robotics) and acceleration scenarios (such as wearable devices, mobile platforms, cloud computing) require different performance optimizations, and existing tools cannot fully meet these requirements. ### Goals of SynDCIM: - **Automated performance - to - layout generation**: Automatically generate the optimal DCIM architecture and complete layout according to user - defined performance metrics. - **Flexible data precision and scalable array parameters**: Support multiple data formats and array sizes. - **Multi - specification - oriented optimization**: Ensure that the generated DCIM macros are optimal in terms of energy efficiency, throughput, and area efficiency. Through these improvements, SynDCIM aims to enhance the flexibility and efficiency of DCIM designs, enabling them to better adapt to the requirements of various AI applications and simplify their integration in modern VLSI workflows.

SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis

CSA-CiM: Enhancing Multi-Functional Computing-in-Memory with Configurable Sense Amplifiers

ARCTIC: Agile and Robust Compute-In-Memory Compiler with Parameterized INT/FP Precision and Built-In Self Test

Computing-in-memory Circuits and Cross-Layer Integrated Design and Optimization: from SRAM to FeFET

Modeling and Benchmarking Computing-in-Memory for Design Space Exploration.

RDCIM: RISC-V Supported Full-Digital Computing-in-Memory Processor With High Energy Efficiency and Low Area Overhead

Automatic Adder Tree Re-Synthesis Tool for Digital Compute-in-Memory Low-Power Optimization

34.7 A 28nm 2.4Mb/mm<sup>2</sup> 6.9 - 16.3TOPS/mm<sup>2</sup> eDRAM-LUT-Based Digital-Computing-in-Memory Macro with In-Memory Encoding and Refreshing

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

S2D-CIM: A 22nm 128kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications

ReDCIM: Reconfigurable Digital Computing- in -Memory Processor with Unified FP/INT Pipeline for Cloud AI Acceleration

A Multi-Chiplet Computing-in-Memory Architecture Exploration Framework Based on Various CIM Devices

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.

High-Quality Iterative Logic Compiler for In-Memory SIMD Computation with Tight Coupling of Synthesis and Scheduling

DCIM-GCN: Digital Computing-in-Memory Accelerator for Graph Convolutional Network

Scaling-CIM: eDRAM In-Memory-Computing Accelerator With Dynamic-Scaling ADC and Adaptive Analog Operation

An Edram Based Computing-in-Memory Macro with Full-Valid-Storage and Channel-Wise-Parallelism for Depthwise Neural Network

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

A 5.6-89.9TOPS/W Heterogeneous Computing-in-Memory SoC with High-Utilization Producer-Consumer Architecture and High-Frequency Read-Free CIM Macro.

MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling