IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

Omar Ghazal,Wei Wang,Shahar Kvatinsky,Farhad Merchant,Alex Yakovlev,Rishad Shafik
2024-12-04
Abstract:The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.
Hardware Architecture,Artificial Intelligence,Emerging Technologies,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the bottleneck encountered by the traditional von Neumann architecture when handling large - scale machine - learning tasks, that is, the data bandwidth requirement exceeds its processing capacity, resulting in increased latency and energy consumption. Specifically, as machine - learning models' demand for processing large amounts of data continues to increase, traditional computing architectures are difficult to efficiently handle such data - driven workloads. Therefore, this paper proposes an in - memory computing (IMC) architecture based on Y - Flash technology - IMPACT - to solve these problems. ### Specific description of the problem 1. **Data bandwidth and energy consumption problems**: - The traditional von Neumann architecture separates the processing unit and the storage unit, which will lead to frequent data transmissions when handling large - scale data, thereby increasing latency and energy consumption. - As the data dimension in machine - learning applications increases, these challenges become more severe. 2. **Limitations of existing solutions**: - Digital CMOS memories (such as SRAM and DRAM) can perform parallel data processing, but they have problems such as static leakage, frequent logic transitions and refresh, resulting in increased energy consumption and latency. - Emerging non - volatile analog memory devices (such as ReRAM, PCM and MRAM) have potential, but also face challenges in terms of reliability and stability. ### Proposed solutions To solve the above problems, this paper proposes the IMPACT architecture, whose main features include: - **Using Y - Flash technology**: Y - Flash is a new type of non - volatile memristor, which combines the advantages of digital and analog memories, and has the characteristics of high precision, low power consumption and high - density integration. - **In - memory computing (IMC)**: By integrating the storage and computing functions in the same architecture, the data transmission overhead is reduced, the parallel processing ability is improved, and thus the latency and energy consumption are significantly reduced. - **Coalesced Tsetlin Machine (CoTM)**: This is a new machine - learning algorithm based on propositional logic, which uses Tsetlin automata (TA) for Boolean feature selection and simplifies the model by merging clauses to improve efficiency and interpretability. ### Main contributions 1. **Developed the first Y - Flash - based IMC architecture IMPACT for CoTM inference**. 2. **Explored the data - processing process of the CoTM algorithm and its hardware implementation, demonstrating its scalability**. 3. **Analyzed the anti - variability of Y - Flash technology, proving its suitability for real - time machine - learning applications**. 4. **Combined Y - Flash devices to accelerate the inference process while ensuring the interpretability of the model, which is crucial for fields requiring transparent decision - making**. Through these improvements, the IMPACT architecture achieves an accuracy rate of 96.3% on the MNIST dataset and is superior to existing CNN, DNN and other neuromorphic computing methods in terms of energy efficiency.