Beyond theory driven discovery: hot random search and datum derived structures

Chris J. Pickard
2024-07-09
Abstract:Data driven methods have transformed the prospects of the computational chemical sciences, with machine learned interatomic potentials (MLIPs) speeding up calculations by several orders of magnitude. I reflect on theory driven, as opposed to data driven, discovery based on ab initio random structure searching (AIRSS), and then introduce two methods which exploit machine learning acceleration. I show how long high throughput anneals, between direct structural relaxation, enabled by ephemeral data derived potentials (EDDPs), can be incorporated into AIRSS to bias the sampling of challenging systems towards low energy configurations. Hot AIRSS (hot-AIRSS) preserves the parallel advantage of random search, while allowing much more complex systems to be tackled. This is demonstrated through searches for complex boron structures in large unit cells. I then show how low energy carbon structures can be directly generated from a single, experimentally determined, diamond structure. An extension to the generation of random sensible structures, candidates are stochastically generated and then optimised to minimise the difference between the EDDP environment vector and that of the reference diamond structure. The distance-based cost function is captured in an actively learned EDDP. Graphite, small nanotubes and caged, fullerene-like, structures emerge from searches using this potential, along with a rich variety of tetrahedral framework structures. Using the same approach, the pyrope, Mg$_3$Al$_2$(SiO$_4$)$_3$, garnet structure is recovered from a low energy AIRSS structure generated in a smaller unit cell with a different chemical composition. The relationship of this approach to modern diffusion model based generative methods is discussed.
Computational Physics,Materials Science,Chemical Physics
What problem does this paper attempt to address?
This paper mainly discusses the application of theory-driven and data-driven approaches in material science discovery, particularly how to use machine learning to accelerate high-throughput structure prediction. The authors propose a method called "ephemeral data-derived potentials (EDDPs)", which can quickly train and be used for ab initio random structure searching (AIRSS) to accelerate the sampling of low-energy configurations of complex systems. The paper also introduces "hot-AIRSS", an optimization strategy that combines long-time molecular dynamics annealing, suitable for handling complex systems in large unit cells such as boron structures. The authors demonstrate the importance of theory-driven discovery through several examples, including hydrogen mixtures, ionic ammonia, complex phases of aluminum, and the search for high-temperature superconductors. The paper also discusses methods for generating randomly reasonable structures and highlights the relationship between data-driven approaches and the generation methods based on modern diffusion models. In experiments, EDDPs are used to accelerate calculations, making long-time scale molecular dynamics simulations possible. The hot-AIRSS method increases the probability of discovering low-energy structures by performing high-temperature annealing after structural optimization. The effectiveness of this method is demonstrated through the study of high-pressure phases of boron, where previously difficult-to-discover complex structures are found. In conclusion, this paper aims to address how to improve the speed and efficiency of structure prediction in materials science through the use of machine learning and data-driven methods, thus facilitating the discovery of new materials.