Abstract:Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants—even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior. Reinforcement learning unifies neuroscience and AI with a universal computational framework for motivated behavior. Humans and robots alike are active and embodied agents who physically interact with the world and learn from feedback to guide future actions while weighing costs of time and energy. Initially, the modeling here attempted to identify learning algorithms for an interactive environment structured with patterns in counterfactual information that a human brain could learn to generalize. However, behavioral analysis revealed that a wider scope was necessary to identify individual differences in not only complex learning but also action bias and hysteresis. Sequential choices in the pursuit of rewards were clearly influenced by endogenous action preferences and persistent bias effects from action history causing repetition or alternation of previous actions. By modeling a modular brain as a mixture of expert and nonexpert systems for behavioral control, a distinct profile could be characterized for each individual attempting the experiment. Even for actions as simple as button pressing, effects specific to actions were as substantial as the effects from reward outcomes that decisions were supposed to follow from. Bias and hysteresis are concluded to be ubiquitous and intertwined with processes of active reinforcement learning for efficiency in behavior.

CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

A double Actor-Critic learning system embedding improved Monte Carlo tree search

Controlling Estimation Error in Reinforcement Learning via Reinforced Operation

Integrated Double Estimator Architecture for Reinforcement Learning

A reinforcement learning diffusion decision model for value-based decisions

Dissociable Neural Representations of Reinforcement and Belief Prediction Errors Underlie Strategic Learning

Adaptive Order Q-learning

LiFE:Deep Exploration Via Linear-Feature Bonus in Continuous Control

Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

Moderate confirmation bias enhances collective decision-making in reinforcement-learning agents

Controlling Underestimation Bias in Reinforcement Learning Via Minmax Operation

Self-correcting Q-learning.

Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts

Multi-Input Autonomous Driving Based on Deep Reinforcement Learning with Double Bias Experience Replay

Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning

LIDAR: Learning from Imperfect Demonstrations with Advantage Rectification

Adapting Double Q-Learning for Continuous Reinforcement Learning