Abstract:The energy-efficient control of mobile robots is crucial as the complexity of their real-world applications increasingly involves high-dimensional observation and action spaces, which cannot be offset by limited on-board resources. An emerging non-Von Neumann model of intelligence, where spiking neural networks (SNNs) are run on neuromorphic processors, is regarded as an energy-efficient and robust alternative to the state-of-the-art real-time robotic controllers for low dimensional control tasks. The challenge now for this new computing paradigm is to scale so that it can keep up with real-world tasks. To do so, SNNs need to overcome the inherent limitations of their training, namely the limited ability of their spiking neurons to represent information and the lack of effective learning algorithms. Here, we propose a population-coded spiking actor network (PopSAN) trained in conjunction with a deep critic network using deep reinforcement learning (DRL). The population coding scheme dramatically increased the representation capacity of the network and the hybrid learning combined the training advantages of deep networks with the energy-efficient inference of spiking networks. To show the general applicability of our approach, we integrated it with a spectrum of both on-policy and off-policy DRL algorithms. We deployed the trained PopSAN on Intel's Loihi neuromorphic chip and benchmarked our method against the mainstream DRL algorithms for continuous control. To allow for a fair comparison among all methods, we validated them on OpenAI gym tasks. Our Loihi-run PopSAN consumed 140 times less energy per inference when compared against the deep actor network on Jetson TX2, and had the same level of performance. Our results support the efficiency of neuromorphic controllers and suggest our hybrid RL as an alternative to deep learning, when both energy-efficiency and robustness are important.

HDPG: hyperdimensional policy-based reinforcement learning for continuous control

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Policy ensemble gradient for continuous control problems in deep reinforcement learning

Robot Control in Human Environment Using Deep Reinforcement Learning and Convolutional Neural Network.

Path Following for Autonomous Ground Vehicle Using DDPG Algorithm: A Reinforcement Learning Approach

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios

FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights

A Modified Convergence DDPG Algorithm for Robotic Manipulation

Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications

Continuous control with deep reinforcement learning

A Hierarchical Framework for Quadruped Robots Gait Planning Based on DDPG

Broad Critic Deep Actor Reinforcement Learning for Continuous Control

Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients

DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies

Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot