Abstract:This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms. Our work offers a systematic analysis of several optimization targets to assess their abilities to find all Pareto optimal policies and controllability over learned policies by the preferences for different objectives. We then identify Tchebycheff scalarization as a favorable scalarization method for MORL. Considering the non-smoothness of Tchebycheff scalarization, we reformulate its minimization problem into a new min-max-max optimization problem. Then, for the stochastic policy class, we propose efficient algorithms using this reformulation to learn Pareto optimal policies. We first propose an online UCB-based algorithm to achieve an $\varepsilon$ learning error with an $\tilde{\mathcal{O}}(\varepsilon^{-2})$ sample complexity for a single given preference. To further reduce the cost of environment exploration under different preferences, we propose a preference-free framework that first explores the environment without pre-defined preferences and then generates solutions for any number of preferences. We prove that it only requires an $\tilde{\mathcal{O}}(\varepsilon^{-2})$ exploration complexity in the exploration phase and demands no additional exploration afterward. Lastly, we analyze the smooth Tchebycheff scalarization, an extension of Tchebycheff scalarization, which is proved to be more advantageous in distinguishing the Pareto optimal policies from other weakly Pareto optimal policies based on entry values of preference vectors. Furthermore, we extend our algorithms and theoretical analysis to accommodate this optimization target.

PMDRL: Pareto-front-based Multi-Objective Deep Reinforcement Learning

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

PA2D-MORL: Pareto Ascent Directional Decomposition Based Multi-Objective Reinforcement Learning

Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems

Towards Pareto-optimal energy management in integrated energy systems: A multi-agent and multi-objective deep reinforcement learning approach

Combining a Gradient-Based Method and an Evolution Strategy for Multi-Objective Reinforcement Learning.

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning

Dynamic Programming with Meta-Reinforcement Learning: a Novel Approach for Multi-Objective Optimization

Multi-agent Dueling Q-learning with Mean Field and Value Decomposition

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Distributed Pareto Reinforcement Learning for Multi-objective Smart Generation Control of Multi-area Interconnected Power Systems

Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems

Deep reinforcement learning for multi-objective game strategy selection

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality

A reinforcement learning approach for dynamic multi-objective optimization

Multiobjective Reinforcement Learning: A Comprehensive Overview