Abstract:Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according to different reward dimensions such as food or water. We show that by weighting these learned values according to the current needs, behaviour may be flexibly adapted to present preferences. This model predicts that individual dopamine neurons should encode the errors associated with some reward dimensions more than with others. To provide a preliminary test of this prediction, we reanalysed a small dataset obtained from a single primate in an experiment which to our knowledge is the only published study where the responses of dopamine neurons to stimuli predicting distinct types of rewards were recorded. We observed that in addition to subjective economic value, dopamine neurons encode a gradient of reward dimensions; some neurons respond most to stimuli predicting food rewards while the others respond more to stimuli predicting fluids. We also proposed a possible implementation of the model in the basal ganglia network, and demonstrated how the striatal system can learn values in multiple dimensions, even when dopamine neurons encode mixtures of prediction error from different dimensions. Additionally, the model reproduces the instant generalisation to new physiological states seen in dopamine responses and in behaviour. Our results demonstrate how a simple neural circuit can flexibly guide behaviour according to animals' needs. Animals and humans can search for different resources depending on their needs. For example, when you are thirsty at work, you may go to a common room where hopefully coffee or water is available, while if you are hungry, you would rather go to a canteen. Such ability to seek different resources based on a physiological state is so fundamental to survival, that is present also in simple animals. This paper proposes how this ability could arise from a simple neural circuit that can be mapped on evolutionary older parts of the vertebrate brain, called the basal ganglia. The model suggests that this circuit learns the availability of different reward types, and then combines them according to the physiological state to control behaviour.

Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP

Reinforcement learning using a continuous time actor-critic framework with spiking neurons

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Reinforcement Learning in Spiking Neural Networks with Stochastic and Deterministic Synapses

Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting

An autonomous learning mobile robot using biological reward modulate STDP

Reinforcement Learning with Feedback-modulated TD-STDP

Training spiking neuronal networks to perform motor control using reinforcement and evolutionary learning

Spatial Properties of STDP in a Self-Learning Spiking Neural Network Enable Controlling a Mobile Robot

Vector-valued dopamine improves learning of continuous outputs in the striatum

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types

Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks

Reinforcement Learning Links Spontaneous Cortical Dopamine Impulses to Reward

Accelerated Robot Learning via Human Brain Signals

Temporal-Difference Learning Using Distributed Error Signals

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm under Sparse Reward Tasks

Mesolimbic dopamine adapts the rate of learning from action

Robotic Episodic Learning and Behaviour Control Integrated with Neuron Stimulation Mechanism

Robotic Arm Controlling Based on a Spiking Neural Circuit and Synaptic Plasticity

Analog Neurons with Dopamine-Modulated STDP

Embodied Synaptic Plasticity with Online Reinforcement learning