Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Peter Vamplew,Cameron Foale,Conor F. Hayes,Patrick Mannion,Enda Howley,Richard Dazeley,Scott Johnson,Johan Källström,Gabriel Ramos,Roxana Rădulescu,Willem Röpke,Diederik M. Roijers
2024-02-05
Abstract:Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to extend the utility function method widely adopted in multi - objective reinforcement learning (MORL) to single - objective reinforcement learning (SORL), so as to provide a unified framework for dealing with single - objective and multi - objective reinforcement learning problems. Specifically, the paper explores the potential benefits of introducing the utility function method in single - objective reinforcement learning, including: 1. **Multi - policy learning**: By learning multiple policies to deal with uncertain goals or risk preferences, the decision - maker's control ability is enhanced. 2. **Risk - aware reinforcement learning**: Allows agents to consider risks in the decision - making process, rather than just maximizing the expected return. 3. **The influence of discount rate**: Regarding the discount rate as part of the utility function, so as to allow the optimal policies under different discount rates to be learned simultaneously. 4. **Satisficing agents**: Develop agents that can avoid over - optimization to improve safety. The paper also discusses some challenges in algorithm implementation when using the utility function method, especially when using non - linear utility functions, issues such as the selection of optimization criteria, the non - additivity of value functions, and the implementation of reward shaping need to be considered. Through these discussions, the paper provides guidance for future single - objective reinforcement learning research.