Abstract:Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the "von Neumann bottleneck" problem encountered by existing computing architectures when handling deep neural network (DNN) workloads. Specifically, the traditional von Neumann architecture cannot meet the increasing computational demands of modern machine - learning models due to high latency and high energy consumption caused by the frequent movement of data between the processor and memory. To solve this bottleneck problem, the paper proposes a photon - memory - based processing - in - memory (PIM) architecture - OPIMA (Optical Processing - In - Memory Accelerator). OPIMA aims to accelerate convolutional neural networks (CNN) by performing optical computing in the main memory, thereby reducing data movement and improving energy efficiency. The following are the specific problems that the paper attempts to solve: 1. **Limitations of traditional PIM architectures**: - DRAM - based PIM is difficult to achieve high throughput and high energy efficiency due to internal data movement bottlenecks and frequent refresh operations. - Non - volatile memories (such as ReRAM, STT - RAM) face manufacturing challenges and durability problems. - Although PCM has high energy efficiency and bit density, it has non - linear response and resistance drift problems under electrical control. 2. **Advantages of photon computing**: - Photon computing can take advantage of the parallelism and low - energy - consumption characteristics of light waves and is suitable for large - scale matrix operations. - By optimizing the optical properties of PCM materials, the accuracy and speed of data reading can be improved without increasing power consumption. 3. **Design goals of the OPIMA architecture**: - Provide efficient multi - bit - density storage units to support complex ML computations. - Achieve high - speed, low - energy - consumption optical computing, thereby significantly improving the throughput and energy efficiency of ML inference. - Solve the data interference and thermal crosstalk problems in traditional PIM architectures to ensure reliable computing performance. The paper shows that OPIMA has significant advantages in performance and energy consumption compared to existing electronic computing systems and other emerging photon PIM architectures. Experimental results show that OPIMA can achieve 2.98 times higher throughput and 137 times better energy efficiency. ### Formula summary The key formulas involved in the paper include: 1. **Optical transmission change model**: \[ T_{\text{out}} = T_{\text{in}} - \Delta T_s - P_{\text{abs}} \] where: - \( T_{\text{out}} \) is the output transmission, - \( T_{\text{in}} \) is the input power, - \( \Delta T_s \) is the optical transmission change due to light scattering and back - reflection, - \( P_{\text{abs}} \) is the total power absorbed by the PCM unit. 2. **Objective of optimized design**: \[ T_{\text{out}} = (T_{\text{in}} - P_{\text{abs}}) \rightarrow \Delta T_s = 0 \] Ensure that the signal change is fully represented by the written data (\( P_{\text{abs}} \)). 3. **Optical transmission contrast**: \[ \Delta T = T_{\text{amorphous}} - T_{\text{crystalline}} \] where: - \( T_{\text{amorphous}} \) is the optical transmission in the amorphous state, - \( T_{\text{crystalline}} \) is the optical transmission in the crystalline state. Through these improvements, OPIMA can effectively solve the bottleneck problems faced by traditional computing architectures when handling deep - learning tasks and provide more efficient and energy - saving solutions.

OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

A design framework for processing-in-memory accelerator

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

PIXEL: Photonic Neural Network Accelerator

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Parapim: A Parallel Processing-In-Memory Accelerator For Binary-Weight Deep Neural Networks

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator

Hyperspectral In-Memory Computing with Optical Frequency Combs and Programmable Optical Memories

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations

IMCE: Energy-efficient Bit-Wise In-Memory Convolution Engine for Deep Neural Network

Optical and Electrical Memories for Analog Optical Computing

Ultra-High-Speed Accelerator Architecture for Convolutional Neural Network Based on Processing-in-Memory Using Resistive Random Access Memory

DyPIM: Dynamic-Inference-Enabled Processing - In-Memory Accelerator

Accelerating Deep Neural Networks in Processing-in-Memory Platforms: Analog or Digital Approach?

An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference