A dynamical policy search model for matching law.

ZhenBo Cheng,Zhidong Deng
DOI: https://doi.org/10.1109/BICTA.2010.5645345
2010-01-01
Abstract:The matching law states that the fraction of choices made to any option will match the fraction of total rewards earned from that option. However, the income earned from conducting the matching behavior does not imply that it will get the optimal reward. It is unclear why subjects frequently exhibit the matching behavior rather than the optimal behavior. In this study, on the basis of the policy search model in reinforcement learning, an optimal algorithm is proposed, and the policy algorithm leading to matching law is derived from the optimal algorithm. Theoretical analysis and simulation results show that the decision behavior achieved by our algorithm is able to reach matching law in many kinds of reward schedules. Our results indicate that matching law can be exhibited whenever the subject tries to maximize a value function under a simple assumption that past choice behavior does not care about the values of future long-run reward. This results unveil the relationships between the matching behavior and the algorithm of optimal policy search. © 2010 IEEE.
What problem does this paper attempt to address?