Statistical information about reward timing is insufficient for promoting optimal persistence decisions
Karolina M Lempert,Lena Schaefer,Darby Breslow,Thomas D Peterson,Joseph W Kable,Joseph T McGuire
DOI: https://doi.org/10.1016/j.cognition.2023.105468
IF: 4.011
Cognition
Abstract:When deciding how long to keep waiting for delayed rewards that will arrive at an uncertain time, different distributions of possible reward times dictate different optimal strategies for maximizing reward. When reward timing distributions are heavy-tailed (e.g., waiting on hold) there is a point at which waiting is no longer advantageous because the opportunity cost of waiting is too high. Alternatively, when reward timing distributions have more predictable timing (e.g., uniform), it is advantageous to wait as long as necessary for the reward. Although people learn to approximate optimal strategies, little is known about how this learning occurs. One possibility is that people learn a general cognitive representation of the probability distribution that governs reward timing and then infer a strategy from that model of the environment. Another possibility is that they learn an action policy in a way that depends more narrowly on direct task experience, such that general knowledge of the reward timing distribution is insufficient for expressing the optimal strategy. Here, in a series of studies in which participants decided how long to persist for delayed rewards before quitting, we provided participants with information about the reward timing distribution in several ways. Whether the information was provided through counterfactual feedback (Study 1), previous exposure (Studies 2a and 2b), or description (Studies 3a and 3b), it did not obviate the need for direct, feedback-driven learning in a decision context. Therefore, learning when to quit waiting for delayed rewards might depend on task-specific experience, not solely on probabilistic reasoning.