GAtor: A First Principles Genetic Algorithm for Molecular Crystal Structure Prediction

Farren Curtis,Xiayue Li,Timothy Rose,Álvaro Vázquez-Mayagoitia,Saswata Bhattacharya,Luca M. Ghiringhelli,Noa Marom
DOI: https://doi.org/10.1039/C8FD00067K
2018-02-23
Abstract:We present the implementation of GAtor, a massively parallel, first principles genetic algorithm (GA) for molecular crystal structure prediction. GAtor is written in Python and currently interfaces with the FHI-aims code to perform local optimizations and energy evaluations using dispersion-inclusive density functional theory (DFT). GAtor offers a variety of fitness evaluation, selection, crossover, and mutation schemes. Breeding operators designed specifically for molecular crystals provide a balance between exploration and exploitation. Evolutionary niching is implemented in GAtor by using machine learning to cluster the dynamically updated population by structural similarity and then employing a cluster-based fitness function. Evolutionary niching promotes uniform sampling of the potential energy surface by evolving several sub-populations, which helps overcome initial pool biases and selection biases (genetic drift). The various settings offered by GAtor increase the likelihood of locating numerous low-energy minima, including those located in disconnected, hard to reach regions of the potential energy landscape. The best structures generated are re-relaxed and re-ranked using a hierarchy of increasingly accurate DFT functionals and dispersion methods. GAtor is applied to a chemically diverse set of four past blind test targets, characterized by different types of intermolecular interactions. The experimentally observed structures and other low-energy structures are found for all four targets. In particular, for Target II, 5-cyano-3-hydroxythiophene, the top ranked putative crystal structure is a $Z^\prime$=2 structure with P$\bar{1}$ symmetry and a scaffold packing motif, which has not been reported previously.
Materials Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in Crystal Structure Prediction (CSP). Specifically, the author has developed a genetic algorithm named GAtor for predicting molecular crystal structures. The following is a detailed description of this problem: ### 1. **Unique Properties and Applications of Molecular Crystals** Molecular crystals are held together by weak van der Waals forces (i.e., dispersion interactions), which results in the same molecules may crystallize into multiple different solid forms, known as polymorphs. These polymorphs may have significant differences in physical properties, thus affecting their application performance in fields such as pharmaceuticals, organic electronics, pigments, and explosives. - **Pharmaceutical Field**: Different polymorphs may exhibit different stabilities, solubilities, and compressibilities, affecting drug manufacturing, bioavailability, and efficacy. - **Organic Electronics and Organic Photovoltaic Devices**: Different polymorphs have different optoelectronic properties, which directly affect device performance. ### 2. **Challenges in Crystal Structure Prediction (CSP)** The goal of CSP is to calculate the possible crystal structures from the two - dimensional chemical maps of molecules. This task is very challenging for the following reasons: - **Minimal Energy Differences**: The energy differences between molecular crystal polymorphs are usually only a few kJ/mol, so high - precision quantum - mechanical methods are required. - **Complex Configuration Space**: The configuration space of molecular crystals is complex and multi - dimensional, containing many local minima and discontinuous derivatives. ### 3. **Existing Methods and Their Limitations** Existing CSP methods include molecular dynamics, Monte Carlo methods, particle swarm optimization, and random search, etc. Although these methods each have their advantages, they have limitations in exploring complex configuration spaces, especially when dealing with multimodal or multi - objective optimization problems. ### 4. **Design Goals of GAtor** To solve the above problems, the author has developed GAtor, a first - principles - based genetic algorithm, aiming to improve molecular crystal structure prediction in the following ways: - **Large - Scale Parallelization**: Make full use of high - performance computing resources to accelerate the structure prediction process. - **Evolutionary Niching**: Through machine - learning clustering methods, dynamically group the population according to structural similarity, and use a cluster - based fitness function to promote uniform sampling of the potential energy surface, overcome the initial pool bias and selection bias (genetic drift). - **Multiple Fitness Evaluation, Selection, Cross - over, and Mutation Schemes**: Provide diverse configuration options to adapt to the characteristics of different chemical systems. - **High - Precision Energy Evaluation**: Use density functional theory (DFT) with dispersion correction for energy evaluation and local optimization to ensure the accuracy of prediction results. ### 5. **Specific Application Scenarios** The paper shows the application of GAtor on four past blind - test targets, which represent different types of intermolecular interactions. The experimental results show that GAtor can find the experimentally observed structures and other low - energy structures. In particular, for Target II (5 - cyano - 3 - hydroxythiophene), an unreported Z′ = 2 structure with P¯1 symmetry and a scaffold stacking pattern was discovered. In conclusion, this paper aims to solve the complexity and high - precision requirements in molecular crystal structure prediction by developing GAtor, thereby providing strong support for materials science and related application fields.