Abstract:Most studies assessing animal decision-making under risk rely on probabilities that are typically larger than 10%. To study Decision-Making in uncertain conditions, we explore a novel experimental and modelling approach that aims at measuring the extent to which rats are sensitive - and how they respond - to outcomes that are both rare (probabilities smaller than 1%) and extreme in their consequences (deviations larger than 10 times the standard error). In a four-armed bandit task, stochastic gains (sugar pellets) and losses (time-out punishments) are such that extremely large - but rare - outcomes materialize or not depending on the chosen options. All rats feature both limited diversification, mixing two options out of four, and sensitivity to rare and extreme outcomes despite their infrequent occurrence, by combining options with avoidance of extreme losses (Black Swans) and exposure to extreme gains (Jackpots). Notably, this sensitivity turns out to be one-sided for the main phenotype in our sample: it features a quasi-complete avoidance of Black Swans, so as to escape extreme losses almost completely, which contrasts with an exposure to Jackpots that is partial only. The flip side of observed choices is that they entail smaller gains and larger losses in the frequent domain compared to alternatives. We have introduced sensitivity to Black Swans and Jackpots in a new class of augmented Reinforcement Learning models and we have estimated their parameters using observed choices and outcomes for each rat. Adding such specific sensitivity results in a good fit of the selected model - and simulated behaviors that are close - to behavioral observations, whereas a standard Q-Learning model without sensitivity is rejected for almost all rats. This model reproducing the main phenotype suggests that frequent outcomes are treated separately from rare and extreme ones through different weights in Decision-Making.

Regulation of reinforcement learning parameters captures long‐term changes in rat behaviour

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

A meta reinforcement learning account of behavioral adaptation to volatility in recurrent neural networks

Meta-reinforcement learning via orbitofrontal cortex

An inductive bias for slowly changing features in human reinforcement learning

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

A global dopaminergic learning rate enables adaptive foraging across many options

Recurrent networks endowed with structural priors explain suboptimal animal behavior

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

Lifelong Reinforcement Learning via Neuromodulation

Learning at Variable Attentional Load Requires Cooperation of Working Memory, Meta-learning, and Attention-augmented Reinforcement Learning

Importance of prefrontal meta control in human-like reinforcement learning

Dopamine transients encode reward prediction errors independent of learning rates

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

Mesolimbic dopamine adapts the rate of learning from action

Performance errors during rodent learning reflect a dynamic choice strategy

Specific Sensitivity to Rare and Extreme Events: Quasi-Complete Black Swan Avoidance vs Partial Jackpot Seeking in Rat Decision-Making

Change point estimation by the mouse medial frontal cortex during probabilistic reward learning

Decision Confidence and Outcome Variability Optimally Regulate Separate Aspects of Hyperparameter Setting

Striatal dopamine reflects individual long-term learning trajectories

Exploration-exploitation mechanisms in recurrent neural networks and human learners in restless bandit problems