Genetic algorithms are strong baselines for molecule generation

Austin Tripp,José Miguel Hernández-Lobato
2023-10-14
Abstract:Generating molecules, both in a directed and undirected fashion, is a huge part of the drug discovery pipeline. Genetic algorithms (GAs) generate molecules by randomly modifying known molecules. In this paper we show that GAs are very strong algorithms for such tasks, outperforming many complicated machine learning methods: a result which many researchers may find surprising. We therefore propose insisting during peer review that new algorithms must have some clear advantage over GAs, which we call the GA criterion. Ultimately our work suggests that a lot of research in molecule generation should be re-assessed.
Neural and Evolutionary Computing,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The paper primarily explores the performance of Genetic Algorithms (GAs) in molecular generation tasks and highlights their advantages as a baseline method. Specifically, the paper attempts to address the following core issues: 1. **Evaluating the effectiveness of Genetic Algorithms**: Through various experiments, the effectiveness of Genetic Algorithms in unconditional molecular generation and single-objective optimization tasks was validated. It was found that their performance is often superior to or at least not inferior to many complex machine learning methods. 2. **Proposing the GA criterion**: Given the excellent performance of Genetic Algorithms in molecular generation tasks, the paper proposes a standard for evaluating new algorithms—namely, that new methods need to surpass Genetic Algorithms in some aspects, whether in empirical advantages or conceptual innovations. 3. **Reflecting on current research practices**: The paper argues that there are some poor experimental practices in the current field of molecular generation, such as the tendency to prove that a novel algorithm is optimal while neglecting a comprehensive evaluation of existing baseline methods. Therefore, the authors call for strengthening the enforcement of this standard during peer review. 4. **Exploring future directions**: Finally, the paper suggests a possibility that many new algorithms fail to surpass Genetic Algorithms because they are essentially generating variants of known molecules, albeit in an indirect manner. Thus, future research could further explore this point. In summary, the paper aims to promote more practical effectiveness in the field by systematically evaluating the performance of Genetic Algorithms in molecular generation tasks and proposes a new standard to help improve research quality.