Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Peter Vamplew,Cameron Foale,Conor F. Hayes,Patrick Mannion,Enda Howley,Richard Dazeley,Scott Johnson,Johan Källström,Gabriel Ramos,Roxana Rădulescu,Willem Röpke,Diederik M. Roijers

2024-02-05

Abstract:Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to extend the utility function method widely adopted in multi - objective reinforcement learning (MORL) to single - objective reinforcement learning (SORL), so as to provide a unified framework for dealing with single - objective and multi - objective reinforcement learning problems. Specifically, the paper explores the potential benefits of introducing the utility function method in single - objective reinforcement learning, including: 1. **Multi - policy learning**: By learning multiple policies to deal with uncertain goals or risk preferences, the decision - maker's control ability is enhanced. 2. **Risk - aware reinforcement learning**: Allows agents to consider risks in the decision - making process, rather than just maximizing the expected return. 3. **The influence of discount rate**: Regarding the discount rate as part of the utility function, so as to allow the optimal policies under different discount rates to be learned simultaneously. 4. **Satisficing agents**: Develop agents that can avoid over - optimization to improve safety. The paper also discusses some challenges in algorithm implementation when using the utility function method, especially when using non - linear utility functions, issues such as the selection of optimization criteria, the non - additivity of value functions, and the implementation of reward shaping need to be considered. Through these discussions, the paper provides guidance for future single - objective reinforcement learning research.

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

Actor-critic multi-objective reinforcement learning for non-linear utility functions

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Value function interference and greedy action selection in value-based multi-objective reinforcement learning

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Demonstration Guided Multi-Objective Reinforcement Learning

Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization

Continual Multi-Objective Reinforcement Learning Via Reward Model Rehearsal

Multi-objective multi-agent decision making: a utility-based analysis and survey

Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

Provable Multi-Objective Reinforcement Learning with Generative Models

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning