Abstract:This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.

Structural Credit Assignment with Coordinated Exploration

Ensemble perspective for understanding temporal credit assignment

Unbiased Weight Maximization

Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning

Incorporating structural plasticity into self-organization recurrent networks for sequence learning

Hindsight Network Credit Assignment: Efficient Credit Assignment in Networks of Discrete Stochastic Units

Model-based Credit Assignment for Model-free Deep Reinforcement Learning

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting

Adaptive structure evolution and biologically plausible synaptic plasticity for recurrent spiking neural networks

Credit Assignment Among Neurons in Co-evolving Populations

MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents

Neural Modulation for Reinforcement Learning in Developmental Networks Facing an Exponential No. of States

Using local plasticity rules to train recurrent neural networks

Adaptive network approach to exploration-exploitation trade-off in reinforcement learning

Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment

Exploration-exploitation mechanisms in recurrent neural networks and human learners in restless bandit problems

Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Spike-driven Multi-Scale Learning with Hybrid Mechanisms of Spiking Dendrites.

A brain-inspired algorithm that mitigates catastrophic forgetting of artificial and spiking neural networks with low computational cost