Abstract:Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).

Multi-objective Reinforcement Learning Algorithm and Its Application in Drive System

Multiobjective Optimization for Controller Design

Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

Online Nash-optimization tracking control of multi-motor driven load system with simplified RL scheme

Design and Implementation of Reinforcement Learning for Automated Driving Compared to Classical MPC Control

Prediction Guided Meta-Learning for Multi-Objective Reinforcement Learning

A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

Multi-objective Longitudinal Decision-making for Autonomous Electric Vehicle: A Entropy-constrained Reinforcement Learning Approach.

Path Planning Algorithm for Multi-Locomotion Robot Based on Multi-Objective Genetic Algorithm with Elitist Strategy

Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy

Hybridization of evolutionary algorithm and deep reinforcement learning for multi-objective orienteering optimization

Multi-objective reinforcement learning for fed-batch fermentation process control

C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Combining a Gradient-Based Method and an Evolution Strategy for Multi-Objective Reinforcement Learning.

Multi-Agent Deep Reinforcement Learning-Based Multi-Objective Cooperative Control Strategy for Hybrid Electric Vehicles

A reinforcement learning approach for dynamic multi-objective optimization

Longitudinal robust dynamic programming control for driving robot vehicles with performance self-learning

Design of Car Control Algorithm Based on Deep Reinforcement Learning

Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning