Abstract:The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is the bottleneck encountered by the traditional von Neumann architecture when handling large - scale machine - learning tasks, that is, the data bandwidth requirement exceeds its processing capacity, resulting in increased latency and energy consumption. Specifically, as machine - learning models' demand for processing large amounts of data continues to increase, traditional computing architectures are difficult to efficiently handle such data - driven workloads. Therefore, this paper proposes an in - memory computing (IMC) architecture based on Y - Flash technology - IMPACT - to solve these problems. ### Specific description of the problem 1. **Data bandwidth and energy consumption problems**: - The traditional von Neumann architecture separates the processing unit and the storage unit, which will lead to frequent data transmissions when handling large - scale data, thereby increasing latency and energy consumption. - As the data dimension in machine - learning applications increases, these challenges become more severe. 2. **Limitations of existing solutions**: - Digital CMOS memories (such as SRAM and DRAM) can perform parallel data processing, but they have problems such as static leakage, frequent logic transitions and refresh, resulting in increased energy consumption and latency. - Emerging non - volatile analog memory devices (such as ReRAM, PCM and MRAM) have potential, but also face challenges in terms of reliability and stability. ### Proposed solutions To solve the above problems, this paper proposes the IMPACT architecture, whose main features include: - **Using Y - Flash technology**: Y - Flash is a new type of non - volatile memristor, which combines the advantages of digital and analog memories, and has the characteristics of high precision, low power consumption and high - density integration. - **In - memory computing (IMC)**: By integrating the storage and computing functions in the same architecture, the data transmission overhead is reduced, the parallel processing ability is improved, and thus the latency and energy consumption are significantly reduced. - **Coalesced Tsetlin Machine (CoTM)**: This is a new machine - learning algorithm based on propositional logic, which uses Tsetlin automata (TA) for Boolean feature selection and simplifies the model by merging clauses to improve efficiency and interpretability. ### Main contributions 1. **Developed the first Y - Flash - based IMC architecture IMPACT for CoTM inference**. 2. **Explored the data - processing process of the CoTM algorithm and its hardware implementation, demonstrating its scalability**. 3. **Analyzed the anti - variability of Y - Flash technology, proving its suitability for real - time machine - learning applications**. 4. **Combined Y - Flash devices to accelerate the inference process while ensuring the interpretability of the model, which is crucial for fields requiring transparent decision - making**. Through these improvements, the IMPACT architecture achieves an accuracy rate of 96.3% on the MNIST dataset and is superior to existing CNN, DNN and other neuromorphic computing methods in terms of energy efficiency.

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

The Impact of Non-linear NVM Devices on In-Memory Computing

In-Memory Learning Automata Architecture using Y-Flash Cell

Neural Network Acceleration and Voice Recognition with a Flash-based In-Memory Computing SoC

IMBUE: In-Memory Boolean-to-CUrrent Inference ArchitecturE for Tsetlin Machines

Flash Memory Array for Efficient Implementation of Deep Neural Networks

In-Memory Computing: The Next-Generation AI Computing Paradigm

In-Memory Computing Integrated Structure Circuit Based on Nonvolatile Flash Memory Unit

Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

In-memory Computing with Emerging Nonvolatile Memory Devices

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Ferroelectric FET Based In-Memory Computing for Few-Shot Learning.

IMAC: In-Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array

Hdc-Im: Hyperdimensional Computing In-Memory Architecture Based On Rram

Flash-based Computing In-Memory Scheme for IOT.

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

A Reconfigurable 4T2R ReRAM Computing In-Memory Macro for Efficient Edge Applications

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.