APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training

Kailin Zeng,QiYuan Zhang,Bin Chen,Bin Liang,Jun Yang
DOI: https://doi.org/10.1109/lra.2022.3214057
IF: 5.2
2022-10-01
IEEE Robotics and Automation Letters
Abstract:Unsupervised pre-training in reinforcement learning enables the agent to gain prior environmental knowledge, which is then fine-tuned in the supervised stage to quickly adapt to various downstream tasks. In the absence of task-related rewards, pre-training aims to acquire policies (i.e., behaviors) that generate different trajectories to explore and master the environment. Previous research categorizes states into their associated behaviors by learning a supervised discriminator. However, an underlying problem persists: such discriminator is trained in lack of relevant data, leading to an underestimation of reward for new states and inadequate exploration. To this end, we introduce an unsupervised active pre-training algorithm for diverse behavior induction (APD). We explicitly characterize the behavior variables with a state-dependent sampling method, and the agent can decompose the entire state space into parts for fine-grained and diverse behavior learning. Specifically, a particle-based entropy estimator is applied to optimize a combination of behavioral entropy and mutual information objective. Moreover, we develop behavior-based representation learning to compress states into the latent space. Experiments show that our method can improve exploration efficiency and outperforms most state-of-the-art unsupervised algorithms on a number of continuous control tasks in the DeepMind Control Suite.
robotics
What problem does this paper attempt to address?