Abstract:Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting.

Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Rationality-bounded Adaptive Learning in Multi-Agent Dynamic Games

Inverse Risk-Sensitive Reinforcement Learning

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning Via Best Response

Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty

Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

Model and Reinforcement Learning for Markov Games with Risk Preferences

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

A Risk-Averse Equilibrium for Multi-Agent Systems

Active Learning for Risk-Sensitive Inverse Reinforcement Learning

Learning under Imitative Strategic Behavior with Unforeseeable Outcomes

Risk-Sensitive Cooperative Games for Human-Machine Systems

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning