PMDRL: Pareto-front-based Multi-Objective Deep Reinforcement Learning

Fangjie Yang,Honglan Huang,Wei Shi,Yang Ma,Yanghe Feng,Guangquan Cheng,Zhong Liu
DOI: https://doi.org/10.1007/s12652-022-04232-x
2022-01-01
Abstract:Most reinforcement learning research aims to optimize agents’ policies for a single objective. However, many real-world applications are inherently characterized by the presence of multiple, possibly conflicting, objectives. As a generalization of standard reinforcement learning approaches, multi-objective reinforcement learning addresses the demand for trade-offs between competing objectives. Instead of using single policy techniques, which involve various pieces of heuristic information such as reward shaping, we propose a novel reinforcement learning method that learns a policy without preference. We argue for the combination of Pareto Optimality theory and the deep Q network as a powerful tool to avoid constructing a synthetic reward function. This method is applied to reach a non-dominated sorting, defined as the Pareto front set, computed simultaneously without assuming any other weighted function or a linear procedure to select an action. We provide theoretical guarantees of our proposed method in the Grid World experiment. Experiments on multi-objective Cartpole demonstrate that our approach exhibits better performance, quick convergence, relatively good stability, and more diverse solutions than the traditional multi-objective deep Q network.
What problem does this paper attempt to address?