Abstract:Learning to make adaptive decisions involves making choices, assessing their consequence, and leveraging this assessment to attain higher rewarding states. Despite vast literature on value-based decision-making, relatively little is known about the cognitive processes underlying decisions in highly uncertain contexts. Real world decisions are rarely accompanied by immediate feedback, explicit rewards, or complete knowledge of the environment. Being able to make informed decisions in such contexts requires significant knowledge about the environment, which can only be gained via exploration. Here we aim at understanding and formalizing the brain mechanisms underlying these processes. To this end, we first designed and performed an experimental task. Human participants had to learn to maximize reward while making sequences of decisions with only basic knowledge of the environment, and in the absence of explicit performance cues. Participants had to rely on their own internal assessment of performance to reveal a covert relationship between their choices and their subsequent consequences to find a strategy leading to the highest cumulative reward. Our results show that the participants' reaction times were longer whenever the decision involved a future consequence, suggesting greater introspection whenever a delayed value had to be considered. The learning time varied significantly across participants. Second, we formalized the neurocognitive processes underlying decision-making within this task, combining mean-field representations of competing neural populations with a reinforcement learning mechanism. This model provided a plausible characterization of the brain dynamics underlying these processes, and reproduced each aspect of the participants' behavior, from their reaction times and choices to their learning rates. In summary, both the experimental results and the model provide a principled explanation to how delayed value may be computed and incorporated into the neural dynamics of decision-making, and to how learning occurs in these uncertain scenarios.

One-shot learning and behavioral eligibility traces in sequential decision making

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Short-term Memory Traces for Action Bias in Human Reinforcement Learning

Expected Eligibility Traces

Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

Distinct replay signatures for prospective decision-making and memory preservation

Trial-by-trial learning of successor representations in human behavior

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making

Tracking subjects’ strategies in behavioural choice experiments at trial resolution

Tracking subjects' strategies in behavioural choice experiments at trial resolution

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

Theta sequences as eligibility traces: a biological solution to credit assignment

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Cognitive mechanisms of learning in sequential decision-making under uncertainty: an experimental and theoretical approach

Instance-based learning: Integrating sampling and repeated decisions from experience.

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

How trial-to-trial learning shapes mappings in the mental lexicon: Modelling Lexical Decision with Linear Discriminative Learning

Provable Reinforcement Learning with a Short-Term Memory

Learning Non-Markovian Decision-Making from State-only Sequences