Abstract:As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

What problem does this paper attempt to address?

This paper is primarily dedicated to addressing the problem of how to improve the decision-making process through human and artificial intelligence (AI) interaction in complex environments with unmeasured confounding factors. Specifically, the paper proposes a new paradigm called "Super Reinforcement Learning" (Super RL). ### Research Background and Motivation As AI is increasingly applied in society, it becomes particularly important to effectively integrate human wisdom with AI systems' capabilities to leverage their respective advantages and reduce risks. Especially in high-risk areas such as autonomous driving, medical research, and algorithmic trading, combining AI systems and human knowledge is crucial for making better decisions. ### Problems Addressed - **Challenges of Reinforcement Learning in Unmeasured Confounding Environments**: Traditional offline reinforcement learning methods rely on datasets of past agent behaviors, but the presence of unobserved variables or confounding factors may hinder the agent from learning the optimal strategy. - **Utilizing Human-AI Interaction**: The authors observe that in the presence of unmeasured confounding factors, past agent behavior policies can reveal additional valuable information not recorded in the observed variables. Therefore, if this information can be incorporated into the policy search in a novel and legitimate way, a stronger policy is expected to be obtained. ### Main Contributions 1. **Introduction of the Super Reinforcement Learning Paradigm**: A new decision-making paradigm called "Super Reinforcement Learning" is proposed. It not only considers the observed covariate information but also uses the recommendations of behavior agents (whether AI or human) as inputs to learn the optimal strategy. This strategy is called the "super policy" and is guaranteed to be superior to existing decision-making methods. 2. **Addressing Unmeasured Confounding Issues**: To overcome the unmeasured confounding issues in learning super policies from offline data, the paper establishes a series of non-parametric and causal identification results. Based on these, several super policy learning algorithms are developed, and their theoretical properties are systematically studied. 3. **Empirical Validation**: Through extensive simulation studies and real-world application cases, the effectiveness of the proposed Super Reinforcement Learning method is demonstrated. ### Related Work - The paper distinguishes itself from other research (such as reward shaping, human feedback-based policy adjustment, safe reinforcement learning, etc.) by utilizing the expertise of behavior agents to discover unobserved information, thereby enhancing the current decision-maker's policy learning process. - Compared to offline policy evaluation (OPE) and learning methods in the presence of unmeasured confounding factors, the unique aspect of this paper is the use of behavior agents' recommendations for decision-making, i.e., taking the actions recommended by behavior agents as additional features in the decision-making process. - The paper also extends the scope of work on partially observable policy learning and evaluation but does not require the assumption of no unmeasured confounding factors. ### Conclusion In summary, this paper proposes the Super Reinforcement Learning framework, leveraging human-AI interaction to address decision-making problems with unmeasured confounding factors. Its superiority is demonstrated through theoretical analysis and empirical studies.

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

Confounding-Robust Policy Improvement with Human-AI Teams

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving

Human-AI Coordination via Human-Regularized Search and Learning

Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning

Shared Autonomy Based on Human-in-the-loop Reinforcement Learning with Policy Constraints

Learning Complementary Policies for Human-AI Teams

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

Human-AI Shared Control via Policy Dissection

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest

Learning from Active Human Involvement Through Proxy Value Propagation

Prioritized Experience-Based Reinforcement Learning With Human Guidance for Autonomous Driving

Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment