Prediction Guided Meta-Learning for Multi-Objective Reinforcement Learning

Fei-Yu Liu,Chao Qian
DOI: https://doi.org/10.1109/cec45853.2021.9504972
2021-01-01
Abstract:Many real-world control problems consist of several different, possibly conflicting, objectives, which require finding a high-quality set of policies that are optimal for different objective preferences. Extensive research mainly focused on how to obtain a high-quality approximated Pareto set of policies, while another important research direction studies how to adapt to new objective preferences quickly. In this paper, we propose a new multi-objective reinforcement learning (MORL) algorithm so-called PG-Meta-MORL for achieving both goals. PG-Meta-MORL frames MORL as a meta-learning problem and iteratively optimizes a meta-policy using multiple tasks with objective preferences selected based on a prediction model, which is trained to guide the optimization process towards best improving the quality of the current Pareto set of policies. The empirical results on several multi-objective continuous control problems show that PG-Meta-MORL can find a high-quality approximated Pareto set of policies, and meanwhile, the obtained meta-policy can be adapted well to new objective preferences using few-shot interactions with the environment.
What problem does this paper attempt to address?