LoGAN: Local generative adversarial network for novel structure prediction

Esther Heid,Péter Kovács,Georg K. H. Madsen
DOI: https://doi.org/10.26434/chemrxiv-2024-vf9l1
2024-04-09
Abstract:The efficient generation and filtering of candidate structures for new materials is becoming increasingly important as starting points for computational studies. In this work, we introduce an approach to Wasserstein generative adversarial networks for predicting unique crystal and molecular structures. Leveraging translation- and rotation-invariant atom-centered local descriptors address some of the major challenges faced by similar methods. Our models require only small sets of known structures as training data. Furthermore, the approach is able to generate both non-periodic and periodic structures based on local coordination. We showcase the data efficiency and versatility of the LoGAN approach by recovering all stable C5H12O isomers using only 39 C4H10O and C6H14O training examples, as well as all known low-energy SiO2 crystal structures utilizing only 167 training examples of other SiO2 crystal structures. We also introduce a filtration technique to reduce the computational cost of subsequent characterization steps by selecting samples from unique basins on the potential energy surface, which allows to minimize the number of geometry relaxations needed after structure generation. LoGAN thus represents a new, versatile approach to generative modeling of crystal and molecular structures in the low-data regime, and is available open-source.
Chemistry
What problem does this paper attempt to address?
The paper attempts to address the problem of efficiently generating and screening candidate structures in the discovery of new materials. Specifically, the authors propose a new method based on the Wasserstein Generative Adversarial Network (GAN) — the Local Generative Adversarial Network (Local GAN, abbreviated as LoGAN), for predicting novel crystal and molecular structures. This method utilizes translational and rotational invariant atomic center local descriptors to overcome the challenges faced by existing methods and requires only a small amount of known structures as training data. The paper demonstrates the efficiency and versatility of LoGAN in low-data scenarios. For example, using only 39 training samples of C₄H₁₀O and C₆H₁₄O, it successfully recovered all stable C₅H₁₂O isomers; using only 167 training samples of other SiO₂ crystal structures, it successfully predicted all known low-energy SiO₂ crystal structures. Additionally, the paper introduces a filtering technique that reduces the computational cost required for subsequent characterization steps by selecting samples from unique basins on the potential energy surface, thereby minimizing the number of geometric relaxations needed. In summary, LoGAN represents a new, general method for generating crystal and molecular structures under low-data conditions.