Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem

Konstantinos Sfikas,Antonios Liapis,Georgios N. Yannakakis
DOI: https://doi.org/10.1145/3449639.3459321
2021-04-18
Abstract:A core challenge of evolutionary search is the need to balance between exploration of the search space and exploitation of highly fit regions. Quality-diversity search has explicitly walked this tightrope between a population's diversity and its quality. This paper extends a popular quality-diversity search algorithm, MAP-Elites, by treating the selection of parents as a multi-armed bandit problem. Using variations of the upper-confidence bound to select parents from under-explored but potentially rewarding areas of the search space can accelerate the discovery of new regions as well as improve its archive's total quality. The paper tests an indirect measure of quality for parent selection: the survival rate of a parent's offspring. Results show that maintaining a balance between exploration and exploitation leads to the most diverse and high-quality set of solutions in three different testbeds.
Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to balance exploration and exploitation in Quality - Diversity (QD) search. Specifically, the paper explores applying the selection strategies in the Multi - Armed Bandit (MAB) problem to the parent selection process in QD algorithms to improve search efficiency and the quality of solutions. By regarding the process of selecting parents as an MAB problem and using the Upper Confidence Bound (UCB) formula to guide the selection, the paper aims to accelerate the discovery of new areas and improve the overall quality of solutions in the archive. The main contribution of the paper lies in proposing several variants of UCB - based selection mechanisms and conducting a comprehensive evaluation on three different testbeds. These selection mechanisms not only consider the number of times an individual or cell is selected (exploration), but also consider the survival rate of offspring (exploitation), so as to optimize the performance of the MAP - Elites algorithm. Experimental results show that, compared with the traditional uniform selection strategy, the proposed method exhibits better performance on all established QD evaluation metrics, especially in balancing solutions with high fitness and diversity.