How cortico-basal ganglia-thalamic subnetworks can shift decision policies to maximize reward rate

Jyotika Bahuguna,Timothy V Verstynen,Jonathan Rubin
DOI: https://doi.org/10.1101/2024.05.21.595174
2024-05-22
Abstract:All mammals exhibit flexible decision policies that depend, at least in part, on the cortico-basal ganglia-thalamic (CBGT) pathways. Yet understanding how the complex connectivity, dynamics, and plasticity of CBGT circuits translates into experience-dependent shifts of decision policies represents a longstanding challenge in neuroscience. Here we used a computational approach to address this problem. Specifically, we simulated decisions driven by CBGT circuits under baseline, unrewarded conditions using a spiking neural network, and fit the resulting behavior to an evidence accumulation model. Using canonical correlation analysis, we then replicated the existence of three recently identified control ensembles (responsiveness, pliancy and choice) within CBGT circuits, with each ensemble mapping to a specific configuration of the evidence accumulation process. We subsequently simulated learning in a simple two-choice task with one optimal (i.e., rewarded) target. We find that value-based learning, via dopaminergic signals acting on cortico-striatal synapses, effectively manages the speed-accuracy tradeoff so as to increase reward rate over time. Within this process, learning-related changes in decision policy can be decomposed in terms of the contributions of each control ensemble, and these changes are driven by sequential reward prediction errors on individual trials. Our results provide a clear and simple mechanism for how dopaminergic plasticity shifts specific subnetworks within CBGT circuits so as to strategically modulate decision policies in order to maximize effective reward rate.
Neuroscience
What problem does this paper attempt to address?
The problem this paper attempts to address is understanding how the cortico-basal ganglia-thalamic (CBGT) circuit adjusts decision-making strategies in an experience-dependent manner to maximize reward rate. Specifically, the paper uses computational modeling to simulate the decision-making process driven by the CBGT circuit under baseline conditions and fits the behavioral outcomes to an evidence accumulation model. The study finds that value learning through dopamine signals acting on cortico-striatal synapses effectively manages the trade-off between speed and accuracy, thereby increasing the reward rate over time. In this process, learning-related changes in decision-making strategies can be decomposed into contributions from three control sets (responsiveness, plasticity, and selectivity), driven by sequential reward prediction errors in each trial. The findings provide a clear and simple mechanism explaining how dopamine-mediated plasticity strategically modulates specific subnetworks within the CBGT circuit to maximize effective reward rate.