Abstract:Background Spiking neural network (SNN), as the next-generation neural network inspired by the human brain, has been proved to be promising for constructing energy-efficient systems due to its inherent property of dynamic representation and information processing mechanism. For applications in deep reinforcement learning (DRL), however, the actual performance of SNNs is usually weaker than that of deep neural networks in high-dimensional continuous control. Related studies have shown that SNN suffers from insufficient model representation capacity and ineffective parameter training methods. To advance SNN to the practical application level, we aim at challenging partially observable Markov decision process (POMDP) high-dimensional control problems. Method Based on Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3), we propose a Spiking Memory TD3 algorithm (SM-TD3), which is a hybrid training framework of a spiking Long Short-Term Memory (Spiking-LSTM) policy network and a deep critic network. The policy leverages population-encoding to improve input encoding precision, spiking-LSTM to provide memory function, and spatio-temporal backpropagation to train parameters. Results and Conclusions We use the Pybullet benchmark to test the performance of SM-TD3 and set up comparisons in three cases of full-observation Markov decision process (MDP), random noise, and random sensor missing. The results show that SM-TD3 with SNN energy-efficient framework solves the POMDP problems under large-scale and high-dimensional tasks. It reaches the same performance level as the deep LSTM-TD3 algorithm. At the same time, SM-TD3 still has competitive robustness when transferred to the same environment in different situations. Finally, we analyze the energy consumption of SM-TD3. Compared with the energy consumption of deep LSTM-TD3, the energy consumption of SM-TD3 is only 20% 50% of that. Facing the practical application level, the proposed SM-TD3 provides an effective solution for both high-performance and energy-efficient.

Reinforcement Learning with Feedback-modulated TD-STDP

Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting

Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP

Reinforcement learning using a continuous time actor-critic framework with spiking neurons

Representation Learning using Event-based STDP

Time-Integrated Spike-Timing-Dependent-Plasticity

STCA: Spatio-Temporal Credit Assignment with Delayed Feedback in Deep Spiking Neural Networks

TDSTDP

Learning Feedforward and Recurrent Deterministic Spiking Neuron Network Feedback Controllers

Cooperation of Spike Timing-Dependent and Heterosynaptic Plasticities in Neural Networks: A Fokker-Planck Approach

MSTDP: A More Biologically Plausible Learning

A Stochastic Approach to STDP

Competitive Hebbian learning through spike-timing-dependent synaptic plasticity

BP-STDP: Approximating backpropagation using spike timing dependent plasticity

An autonomous learning mobile robot using biological reward modulate STDP

Spatial Properties of STDP in a Self-Learning Spiking Neural Network Enable Controlling a Mobile Robot

Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks

S-TLLR: STDP-inspired Temporal Local Learning Rule for Spiking Neural Networks

Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning

Spiking Memory Policy with Population-encoding for Partially Observable Markov Decision Process Problems