Regulation of reinforcement learning parameters captures long‐term changes in rat behaviour

François Cinotti,Etienne Coutureau,Mehdi Khamassi,Alain R. Marchand,Benoît Girard
DOI: https://doi.org/10.1111/ejn.16449
IF: 3.698
2024-06-25
European Journal of Neuroscience
Abstract:In a three‐armed bandit task conducted over several sessions, rats show improved performance and decreased exploration, which cannot be captured by a Q‐learning model with static parameters. Meta‐learning models in which the average reward rate regulates either the exploration‐exploitation trade‐off or the rate of learning captures these long‐term changes. In uncertain environments in which resources fluctuate continuously, animals must permanently decide whether to stabilise learning and exploit what they currently believe to be their best option, or instead explore potential alternatives and learn fast from new observations. While such a trade‐off has been extensively studied in pretrained animals facing non‐stationary decision‐making tasks, it is yet unknown how they progressively tune it while learning the task structure during pretraining. Here, we compared the ability of different computational models to account for long‐term changes in the behaviour of 24 rats while they learned to choose a rewarded lever in a three‐armed bandit task across 24 days of pretraining. We found that the day‐by‐day evolution of rat performance and win‐shift tendency revealed a progressive stabilisation of the way they regulated reinforcement learning parameters. We successfully captured these behavioural adaptations using a meta‐learning model in which either the learning rate or the inverse temperature was controlled by the average reward rate.
neurosciences
What problem does this paper attempt to address?