Efficient Reinforcement Learning On Passive RRAM Crossbar Array

Arjun Tyagi,Shubham Sahay
2024-07-11
Abstract:The unprecedented growth in the field of machine learning has led to the development of deep neuromorphic networks trained on labelled dataset with capability to mimic or even exceed human capabilities. However, for applications involving continuous decision making in unknown environments, such as rovers for space exploration, robots, unmanned aerial vehicles, etc., explicit supervision and generation of labelled data set is extremely difficult and expensive. Reinforcement learning (RL) allows the agents to take decisions without any (human/external) supervision or training on labelled dataset. However, the conventional implementations of RL on advanced digital CPUs/GPUs incur a significantly large power dissipation owing to their inherent von-Neumann architecture. Although crossbar arrays of emerging non-volatile memories such as resistive (R)RAMs with their innate capability to perform energy-efficient in situ multiply-accumulate operation appear promising for Q-learning-based RL implementations, their limited endurance restricts their application in practical RL systems with overwhelming weight updates. To address this issue and realize the true potential of RRAM-based RL implementations, in this work, for the first time, we perform an algorithm-hardware co-design and propose a novel implementation of Monte Carlo (MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of the proposed MC RL implementation on the classical cart-pole problem and demonstrate that it not only outperforms the prior digital and active 1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of magnitude in terms of area but is also robust against the spatial and temporal variations and endurance failure of RRAMs.
Emerging Technologies
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Reducing energy consumption in Reinforcement Learning (RL) hardware implementations**: Traditional RL algorithms, when implemented on advanced digital CPUs/GPUs, consume a significant amount of energy due to the von Neumann architecture. The paper proposes a Monte Carlo (MC) RL algorithm implementation method based on passive RRAM crossbar arrays to reduce energy consumption. 2. **Improving the hardware friendliness of RL algorithms**: Most previous hardware implementations have focused on neural network-based RL algorithms (such as Deep-Q Learning), which require frequent weight updates, leading to durability issues for storage devices. In contrast, MC learning updates weights only at the end of each "episode," significantly reducing the number of weight updates and thus alleviating the durability burden on storage devices. 3. **Exploring the potential of passive RRAM crossbar arrays**: Compared to active 1T-1R crossbar arrays, passive RRAM crossbar arrays have a smaller area overhead but face issues such as sneak path currents. This paper demonstrates the potential of passive RRAM crossbar arrays in implementing RL algorithms by optimizing the stack design and proving their superior performance in the classic Cart-Pole problem. 4. **Achieving efficient and robust RL systems**: Through algorithm-hardware co-design, the paper proposes a new MC RL algorithm hardware implementation scheme that not only reduces the area by five orders of magnitude compared to previous implementations but also shows strong robustness to spatial and temporal variations and endurance failures of RRAM devices.