A Two-Stage Multi-Objective Deep Reinforcement Learning Framework.

Diqi Chen,Yizhou Wang,Wen Gao
DOI: https://doi.org/10.3233/faia200202
2020-01-01
Abstract:In multi-objective decision making problems, multiobjective reinforcement learning (MORL) algorithms aim to approximate the Pareto frontier uniformly. A naive approach is to learn multiple policies by repeatedly running a single-objective reinforcement learning (RL) algorithm on scalarized rewards. The scalarization methods denote the preferences of objectives, which are different in each run. However, in this way, the model representation and computation are redundant. Furthermore, uniform preferences can not guarantee a uniformly approximated Pareto frontier. To address these problems and leverage the expressive power of deep neural networks, we propose a two-stage MORL framework integrating a multi-policy deep RL algorithm and an evolution strategy algorithm. Firstly, a multi-policy soft actor-critic algorithm is proposed to collaboratively learn multiple policies which are assigned with different scalarization weights. The lower layers of all policy networks are shared. The first-stage learning can be regarded as representation learning. Secondly, the multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) is applied to fine-tune policy-independent parameters to approach a dense and uniform estimation of the Pareto frontier. Experimental results on two benchmarks (Deep Sea Treasure and Adaptive Streaming) show the superiority of the proposed method.
What problem does this paper attempt to address?