Meta-Reinforcement Learning reconciles surprise, value and control in the anterior cingulate cortex.

Tim Vriens,Eliana Vassena,Giovanni Pezzulo,Gianluca Baldassarre,Massimo Silvetti
DOI: https://doi.org/10.1101/2024.05.15.592711
2024-05-15
Abstract:The role of the dorsal anterior cingulate cortex (dACC) in cognition is a frequently studied yet highly debated topic in neuroscience. Most authors agree that the dACC is involved in either cognitive control (e.g. voluntary inhibition of automatic responses) or monitoring (e.g. comparing expectations with outcomes, detecting errors, tracking surprise). A consensus on which theoretical perspective best explains dACC contribution to behaviour is still lacking. In a recent neuroimaging study, the experimental predictions of two prominent models formalizing the cognitive control hypothesis (Expected Value of Control, EVC) and the monitoring hypothesis (Predicted Response Outcome, PRO) have been tested using a behavioural task involving both monitoring and cognitive control mechanisms. The results indicated that of the two tested models, only the PRO model effectively predicted the dACC activity, indicating surprise tracking for performance monitoring as the key sole underlying mechanism, even when cognitive control was required by the task at hand. These findings challenged the long-standing and established cognitive control hypothesis of dACC function and opened a theory crisis: the proposed surprise-monitoring hypothesis indeed cannot account for a wide array of previous experimental findings evidencing dACC activation in tasks requiring cognitive control without involving monitoring or surprise. Here we propose a novel hypothesis on dACC function that integrates both the monitoring and the cognitive control perspective in a unifying coherent framework, based on meta-Reinforcement Learning. Our model, the Reinforcement Meta Learner (RML), optimizes cognitive control - as in control models like EVC- by meta-learning based on tracking surprise - as in monitoring models like PRO. We tested RML experimental predictions with the same behavioural task used to compare the PRO and EVC models, and showed that RML predictions on dACC activity matched PRO predictions and outperformed EVC predictions. However, crucially, the RML simultaneously accounts for both cognitive control and monitoring functions, resolving the theoretical impasse about dACC function within an integrative framework. In sum, our results suggest that dACC function can be framed as a meta-learning optimiser of cognitive control, providing an integrative perspective on its roles in cognitive control, surprise tracking, and performance monitoring.
Neuroscience
What problem does this paper attempt to address?