Abstract:With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this article, we focus on the task where the agent needs to learn multidimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multilayer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based reinforcement learning (RL) methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network (SAN) without any floating-point matrix operations, we draw inspiration from the nonspiking interneurons found in insects and employ the membrane voltage of the nonspiking neurons to represent the action. Before the nonspiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intralayer connections are used in output populations to enhance the representation capacity. This mechanism exists extensively in animals and has been demonstrated effectively. Finally, we propose a fully SAN with intralayer connections (ILC-SAN). Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art performance on continuous control tasks from OpenAI gym. Moreover, we estimate the theoretical energy consumption when deploying ILC-SAN on neuromorphic chips to illustrate its high energy efficiency.

A Low-Power Actor-Critic Framework Based on Memristive Spiking Neural Network

A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning

A Novel Reinforcement Learning Algorithm Based on Multilayer Memristive Spiking Neural Network With Applications

A Memristor-Based Spiking Neural Network With High Scalability and Learning Efficiency

Dynamic Resistance Based Spiking Actor Network for Improving Reinforcement Learning.

Multi-Attribute Dynamic Attenuation Learning Improved Spiking Actor Network

Human-Level Control Through Directly-Trained Deep Spiking Q-Networks

A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning

Energy-aware bio-inspired spiking reinforcement learning system architecture for real-time autonomous edge applications

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

Fully Spiking Actor Network With Intralayer Connections for Reinforcement Learning

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

An Energy Efficient STDP-Based SNN Architecture With On-Chip Learning

Exploiting Memristors for Neuromorphic Reinforcement Learning

Deep Reinforcement Learning with Spiking Q-learning

In-situ Learning in Hardware Compatible Multi-layer Memristive Spiking Neural Network

Efficient Spiking Neural Networks with Biologically Similar Lithium-Ion Memristor Neurons

Reinforcement Learning in Memristive Spiking Neural Networks Through Modulation of ReSuMe

Memristor-based Deep Spiking Neural Network with a Computing-In-Memory Architecture

A Hybrid Spiking Neural Network Reinforcement Learning Agent for Energy-Efficient Object Manipulation