Abstract:Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor–critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment

Lifelong Reinforcement Learning via Neuromodulation

Brain-like neural dynamics for behavioral control develop through reinforcement learning

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks

Context meta-reinforcement learning via neuromodulation

Reinforcement Learning and its Connections with Neuroscience and Psychology

Neuron-level prediction and noise can implement flexible reward-seeking behavior

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

The cost of behavioral flexibility: reversal learning driven by a spiking neural network

Neural networks with motivation

Predictive auxiliary objectives in deep RL mimic learning in the brain

Advanced Reinforcement Learning and Its Connections with Brain Neuroscience

Evolving Reservoirs for Meta Reinforcement Learning

A meta reinforcement learning account of behavioral adaptation to volatility in recurrent neural networks

Emergence of integrated behaviors through direct optimization for homeostasis

Biologically plausible local synaptic learning rules robustly implement deep supervised learning

Improving Learning Efficiency of Recurrent Neural Network Through Adjusting Weights of All Layers in a Biologically-Inspired Framework.