Predictive crystallography at scale: mapping, validating, and learning from 1,000 crystal energy landscapes

Graeme Day,Christopher R. Taylor,Patrick W. V. Butler
DOI: https://doi.org/10.26434/chemrxiv-2024-0c329
2024-05-20
Abstract:Computational crystal structure prediction (CSP) is an increasingly powerful technique in materials discovery, due to its ability to reveal trends and permit insight across the possibility space of crystal structures of a candidate molecule, beyond simply the observed structure(s). In this work, we demonstrate the reliability and scalability of CSP methods for small, rigid organic molecules by performing in-depth CSP investigations for over 1000 such compounds, the largest survey of its kind to-date. We show that this highly-efficient force-field-based CSP approach is superbly predictive, locating 99.4\% of observed experimental structures, and ranking a large majority of these (74\%) as among the most stable possible structures (to within uncertainty due to thermal effects). We present two examples of insights such large predicted datasets can permit, examining the space group preferences of organic molecular crystals and rationalising empirical rules concerning the spontaneous resolution of chiral molecules. Finally, we exploit this large and diverse dataset for developing transferable machine-learned energy potentials for the organic solid state, training a neural network lattice energy correction to force field energies that offers substantial improvements to the already impressive energy rankings, and a MACE equivariant message-passing neural network for crystal structure reoptimisation. We conclude that the excellent performance and reliability of the CSP workflow enables the creation of very large datasets of broad utility and explanatory power in materials design.
Chemistry
What problem does this paper attempt to address?
This paper mainly discusses the application of large-scale predictive crystallography in materials discovery, especially in mapping, validation, and learning of energy landscapes of small organic crystals. The research team conducted the largest-scale investigation to date, performing in-depth computational crystal structure predictions for over 1000 such compounds, with the aim of demonstrating the reliability and scalability of this method. They successfully predicted 99.4% of experimental structures using a force field-based efficient approach, with 74% of these structures listed as the most stable possible structures. The paper mentions that these extensive predictive data can provide insights, such as analyzing the space group preferences of organic molecular crystals and explaining the empirical rules for the spontaneous resolution of chiral molecules. Furthermore, transferable energy potentials applicable to organic solid states were developed using machine learning, correcting force field energies through neural networks and optimizing crystal structures using message-passing neural networks to improve energy rankings. The research also indicates that the outstanding performance and reliability of this computational crystal structure prediction workflow enable the creation of large amounts of versatile and explanatory datasets, which are of significant importance for material design. In the future, these datasets and machine learning models are expected to further drive the design and discovery of new materials.