Abstract:Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting.

Lipschitz Lifelong Reinforcement Learning

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

Lifelong Reinforcement Learning with Modulating Masks

Low-Rank MDPs with Continuous Action Spaces

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

Learning state correspondence of reinforcement learning tasks for knowledge transfer

The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch

Decision-Focused Model-based Reinforcement Learning for Reward Transfer

Multi-turn Reinforcement Learning from Preference Human Feedback

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns.

A Multiple-Attribute Decision-Making Approach to Reinforcement Learning.

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

Decoupling Dynamics and Reward for Transfer Learning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Risk-sensitive Markov Decision Process and Learning under General Utility Functions