Dynamics-aware Novelty Search with Behavior Repulsion

Kang Xu,Yan Ma,Wei Li
DOI: https://doi.org/10.1145/3512290.3528761
2022-01-01
Abstract:Searching solutions for the task with sparse or deceptive rewards is a fundamental problem in Evolutionary Algorithms (EA) and Reinforcement Learning (RL). Existing methods in RL have been proposed to enhance the exploration by encouraging agents to obtain novel states. However, solely seeking a single local optimal solution could be insufficient for the tasks with the deceptive local optima. Novelty-Search (NS) and Quality-Diversity (QD) have shown promising results for finding diverse solutions with different behavioral characteristics. However, manually defining the task-specific behavior description limits these methods to low-dimensional tasks. This paper presents Dynamics-aware Novelty Search with Behavior Repulsion (DANSBR), a hybrid algorithm that evolves high-performing solutions by introducing a generalized novelty measurement and a bidirectional gradient-based mutation operator based on the Quality-Diversity paradigm. The novelty of a single solution is defined as the prediction error of an approximate dynamic model in the task-agnostic behavior space. The mutation operator drives the solution to behave differently or obtain better performance in a sample-eficient manner. As a result of better exploration, our approach outperforms several baselines on high-dimensional continuous control tasks with sparse rewards. Empirical results also demonstrate that DANSBR improves the performance on the task with deceptive rewards.
What problem does this paper attempt to address?