Multi-Objective Reinforcement Learning with Non-Linear Scalarization.

Mridul Agarwal,Vaneet Aggarwal,Tian Lan
2022-01-01
Abstract:Multi-Objective Reinforcement Learning (MORL) setup naturally arises in many places where an agent optimizes multiple objectives. We consider the problem of MORL where multiple objectives are combined using a non-linear scalarization. We combine the vector objectives with a concave scalarization function and maximize this scalar objective. To work with the non-linear scalarization, in this paper, we propose a solution using steady-state occupancy measures and long-term average rewards. We show that when the scalarization function is element-wise increasing, the optimal policy for the scalarization is also Pareto optimal. To maximize the scalarized objective, we propose a model-based posterior sampling algorithm. Using a novel Bellman error analysis for infinite horizon MDPs based proof, we show that the proposed algorithm obtains a regret bound of ~O(LKDS√A/T) for K objectives, and L-Lipschitz continous scalarization function for MDP with S states, A actions, and diameter D. Additionally, we propose policy-gradient and actor-critic algorithms for MORL. For the policy gradient actor, we obtain the gradient using chain rule, and we learn different critics for each of the K objectives. Finally, we implement our algorithms on multiple environments including deep-sea treasure, and network scheduling setups to demonstrate that the proposed algorithms can optimize non-linear scalarization of multiple objectives.
What problem does this paper attempt to address?