Abstract:Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor–critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Dorsolateral prefrontal cortex drives strategic aborting by optimizing long-run policy extraction

Ventrolateral prefrontal cortex in macaques guides decisions in different learning contexts

Policy adjustment in a dynamic economic game

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Competing neural representations of choice shape evidence accumulation in humans

Dissociable representations of decision variables within subdivisions of macaque orbitofrontal and ventrolateral frontal cortex

How cortico-basal ganglia-thalamic subnetworks can shift decision policies to maximize reward rate

From conflict management to reward-based decision making: actors and critics in primate medial frontal cortex

Rational inattention in neural coding for economic choice

Primate thalamic nuclei select abstract rules and shape prefrontal dynamics

Dissociable Representations of Decision Variables within Subdivisions of the Macaque Orbital and Ventrolateral Frontal Cortex

Human dorsal striatal activity during choice discriminates reinforcement learning behavior from the gambler's fallacy

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

Neural dynamics in the orbitofrontal cortex reveal cognitive strategies

Specializations for reward-guided decision-making in the primate ventral prefrontal cortex

Prelimbic neuron assemblies with delayed activation encode the economic decision-making process in a bandit game

Control over a mixture of policies determines change of mind topology during continuous choice

Neural Representation of Cost–benefit Selections in Rat Anterior Cingulate Cortex in Self-Paced Decision Making

Signals in Human Striatum Are Appropriate for Policy Update Rather Than Value Prediction

Primate anterior insular cortex represents economic decision variables proposed by prospect theory

Encoding priors in the brain: a reinforcement learning model for mouse decision making