Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

Jiayi Wang,Zhengling Qi,Chengchun Shi
2023-10-21
Abstract:As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.
Machine Learning,Statistics Theory,Methodology
What problem does this paper attempt to address?
This paper is primarily dedicated to addressing the problem of how to improve the decision-making process through human and artificial intelligence (AI) interaction in complex environments with unmeasured confounding factors. Specifically, the paper proposes a new paradigm called "Super Reinforcement Learning" (Super RL). ### Research Background and Motivation As AI is increasingly applied in society, it becomes particularly important to effectively integrate human wisdom with AI systems' capabilities to leverage their respective advantages and reduce risks. Especially in high-risk areas such as autonomous driving, medical research, and algorithmic trading, combining AI systems and human knowledge is crucial for making better decisions. ### Problems Addressed - **Challenges of Reinforcement Learning in Unmeasured Confounding Environments**: Traditional offline reinforcement learning methods rely on datasets of past agent behaviors, but the presence of unobserved variables or confounding factors may hinder the agent from learning the optimal strategy. - **Utilizing Human-AI Interaction**: The authors observe that in the presence of unmeasured confounding factors, past agent behavior policies can reveal additional valuable information not recorded in the observed variables. Therefore, if this information can be incorporated into the policy search in a novel and legitimate way, a stronger policy is expected to be obtained. ### Main Contributions 1. **Introduction of the Super Reinforcement Learning Paradigm**: A new decision-making paradigm called "Super Reinforcement Learning" is proposed. It not only considers the observed covariate information but also uses the recommendations of behavior agents (whether AI or human) as inputs to learn the optimal strategy. This strategy is called the "super policy" and is guaranteed to be superior to existing decision-making methods. 2. **Addressing Unmeasured Confounding Issues**: To overcome the unmeasured confounding issues in learning super policies from offline data, the paper establishes a series of non-parametric and causal identification results. Based on these, several super policy learning algorithms are developed, and their theoretical properties are systematically studied. 3. **Empirical Validation**: Through extensive simulation studies and real-world application cases, the effectiveness of the proposed Super Reinforcement Learning method is demonstrated. ### Related Work - The paper distinguishes itself from other research (such as reward shaping, human feedback-based policy adjustment, safe reinforcement learning, etc.) by utilizing the expertise of behavior agents to discover unobserved information, thereby enhancing the current decision-maker's policy learning process. - Compared to offline policy evaluation (OPE) and learning methods in the presence of unmeasured confounding factors, the unique aspect of this paper is the use of behavior agents' recommendations for decision-making, i.e., taking the actions recommended by behavior agents as additional features in the decision-making process. - The paper also extends the scope of work on partially observable policy learning and evaluation but does not require the assumption of no unmeasured confounding factors. ### Conclusion In summary, this paper proposes the Super Reinforcement Learning framework, leveraging human-AI interaction to address decision-making problems with unmeasured confounding factors. Its superiority is demonstrated through theoretical analysis and empirical studies.