Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning

Eshaan S Iyer,Megan A Kairiss,Adrian Liu,A Ross Otto,Rosemary C Bagot
DOI: https://doi.org/10.1016/j.jneumeth.2020.108777
2020-07-15
Abstract:Background: Reinforcement learning (RL) and win stay/lose shift model accounts of decision making are both widely used to describe how individuals learn about and interact with rewarding environments. Though mutually informative, these accounts are often conceptualized as independent processes and so the potential relationships between win stay/lose shift tendencies and RL parameters have not been explored. New method: We introduce a methodology to directly relate RL parameters to behavioral strategy. Specifically, by calculating a truncated multivariate normal distribution of RL parameters given win stay/lose shift tendencies from simulating these tendencies across the parameter space, we maximize the normal distribution for a given set of win stay/lose shift tendencies to approximate reinforcement learning parameters. Results: We demonstrate novel relationships between win stay/lose shift tendencies and RL parameters that challenge conventional interpretations of lose shift as a metric of loss sensitivity. Further, we demonstrate in both simulated and empirical data that this method of parameter approximation yields reliable parameter recovery. Comparison with existing method: We compare this method against the conventionally used maximum likelihood estimation method for parameter approximation in simulated noisy and empirical data. For simulated noisy data, we show that this method performs similarly to maximum likelihood estimation. For empirical data, however, this method provides a more reliable approximation of reinforcement learning parameters than maximum likelihood estimation. Conclusions: We demonstrate the existence of relationships between win stay/lose shift tendencies and RL parameters and introduce a method that leverages these relationships to enable recovery of RL parameters exclusively from win stay/lose shift tendencies.
What problem does this paper attempt to address?