Distributional Pareto-Optimal Multi-Objective Reinforcement Learning

Xin-Qiang Cai,Pushi Zhang,Li Zhao,Jiang Bian,Masashi Sugiyama,Ashley Llorens
2023-01-01
Abstract:Multi-objective reinforcement learning (MORL) has been proposed to learn control 1 policies over multiple competing objectives with each possible preference over 2 returns. However, current MORL algorithms fail to account for distributional 3 preferences over the multi-variate returns, which are particularly important in real-4 world scenarios such as autonomous driving. To address this issue, we extend the 5 concept of Pareto-optimality in MORL into distributional Pareto-optimality, which 6 captures the optimality of return distributions, rather than the expectations. Our 7 proposed method, called Distributional Pareto-Optimal Multi-Objective Reinforce-8 ment Learning (DPMORL), is capable of learning distributional Pareto-optimal 9 policies that balance multiple objectives while considering the return uncertainty. 10 We evaluated our method on several benchmark problems and demonstrated its 11 effectiveness in discovering distributional Pareto-optimal policies and satisfying 12 diverse distributional preferences compared to existing MORL methods.
Computer Science
What problem does this paper attempt to address?