Abstract:Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting.

Risk-Averse Bayes-Adaptive Reinforcement Learning

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Provably Efficient CVaR RL in Low-rank MDPs

Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

On the Maximization of Long-Run Reward CVaR for Markov Decision Processes

Risk‐sensitive markov decision processes with long‐run CVaR criterion

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Towards Safe Reinforcement Learning Via Constraining Conditional Value-at-Risk

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path

Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path.

Constrained Risk-Averse Markov Decision Processes

Risk Sensitive Markov Decision Process for Portfolio Management

Distributional Method for Risk Averse Reinforcement Learning