Tuning Reinforcement Learning Parameters for Cluster Selection to Enhance Evolutionary Algorithms

Nathan Villavicencio,Michael N Groves
DOI: https://doi.org/10.1021/acsengineeringau.3c00068
2024-04-16
Abstract:The ability to find optimal molecular structures with desired properties is a popular challenge, with applications in areas such as drug discovery. Genetic algorithms are a common approach to global minima molecular searches due to their ability to search large regions of the energy landscape and decrease computational time via parallelization. In order to decrease the amount of unstable intermediate structures being produced and increase the overall efficiency of an evolutionary algorithm, clustering was introduced in multiple instances. However, there is little literature detailing the effects of differentiating the selection frequencies between clusters. In order to find a balance between exploration and exploitation in our genetic algorithm, we propose a system of clustering the starting population and choosing clusters for an evolutionary algorithm run via a dynamic probability that is dependent on the fitness of molecules generated by each cluster. We define four parameters, MFavOvrAll-A, MFavClus-B, NoNewFavClus-C, and Select-D, that correspond to a reward for producing the best structure overall, a reward for producing the best structure in its own cluster, a penalty for not producing the best structure, and a penalty based on the selection ratio of the cluster, respectively. A reward increases the probability of a cluster's future selection, while a penalty decreases it. In order to optimize these four parameters, we used a Gaussian distribution to approximate the evolutionary algorithm performance of each cluster and performed a grid search for different parameter combinations. Results show parameter MFavOvrAll-A (rewarding clusters for producing the best structure overall) and parameter Select-D (appearance penalty) have a significantly larger effect than parameters MFavClus-B and NoNewFavClus-C. In order to produce the most successful models, a balance between MFavOvrAll-A and Select-D must be made that reflects the exploitation vs exploration trade-off often seen in reinforcement learning algorithms. Results show that our reinforcement-learning-based method for selecting clusters outperforms an unclustered evolutionary algorithm for quinoline-like structure searches.
What problem does this paper attempt to address?