Relating Human Error-Based Learning to Modern Deep RL Algorithms

Michele Garibbo,Casimir J H Ludwig,Nathan F Lepora,Laurence Aitchison
DOI: https://doi.org/10.1162/neco_a_01721
2024-10-08
Abstract:In human error-based learning, the size and direction of a scalar error (i.e., the "directed error") are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error-based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error-based learning. We show that all three deep RL approaches are qualitatively different from human error-based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error-based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error-based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.
What problem does this paper attempt to address?