Abstract:Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor–critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task

Brain Inspired Episodic Memory Deep Q-Networks for Sparse Reward

Behavioral decision-making of mobile robots simulating the memory consolidation mechanism of human brain

Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task

Integrated model of cerebellal supervised learning and basal ganglia's reinforcement learning for mobile robot behavioral decision-making

Episodic Reinforcement Learning with Associative Memory.

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Reliance on Episodic vs. Procedural Systems in Decision-Making Depends on Individual Differences in Their Relative Neural Efficiency

Neuronal Representation of a Working Memory-Based Decision Strategy in the Motor and Prefrontal Cortico-Basal Ganglia Loops

Discovering Cognitive Strategies with Tiny Recurrent Neural Networks

Episodic Reinforcement Learning with Expanded State-reward Space

Sample Efficient Reinforcement Learning Method Via High Efficient Episodic Memory.

Robotic Autonomous Behavior Selection Using Episodic Memory and Attention System.

Distinct replay signatures for prospective decision-making and memory preservation

Policy adjustment in a dynamic economic game

Reinforcement Learning and its Connections with Neuroscience and Psychology

Continuous Episodic Control

Sequential memory improves sample and memory efficiency in Episodic Control

Deep Reinforcement Learning with Parametric Episodic Memory

Spatial Cognition and Decision Model Based on Hippocampus-Prefrontal Cortex Interaction

Robotic Episodic Learning and Behaviour Control Integrated with Neuron Stimulation Mechanism