Genetic programming-based learning of carbon interatomic potential for materials discovery

Andrew Eldridge,Alejandro Rodriguez,Ming Hu,Jianjun Hu
DOI: https://doi.org/10.48550/arXiv.2204.00735
2022-04-02
Abstract:Efficient and accurate interatomic potential functions are critical to computational study of materials while searching for structures with desired properties. Traditionally, potential functions or energy landscapes are designed by experts based on theoretical or heuristic knowledge. Here, we propose a new approach to leverage strongly typed parallel genetic programming (GP) for potential function discovery. We use a multi-objective evolutionary algorithm with NSGA-III selection to optimize individual age, fitness, and complexity through symbolic regression. With a DFT dataset of 863 unique carbon allotrope configurations drawn from 858 carbon structures, the generated potentials are able to predict total energies within $\pm 7.70$ eV at low computational cost while generalizing well across multiple carbon structures. Our code is open source and available at \url{<a class="link-external link-http" href="http://www.github.com/usccolumbia/mlpotential" rel="external noopener nofollow">this http URL</a>
Materials Science,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct the potential function between carbon atoms efficiently and accurately in material discovery. Traditionally, these potential functions or energy landscapes are designed by experts based on theoretical or heuristic knowledge. However, this method has certain limitations, such as the need for a large amount of professional knowledge, a complex design process and difficulty in optimization. Therefore, the paper proposes a new method, that is, using strongly - typed parallel genetic programming (GP) to automatically discover the inter - atomic potential function. This method aims to optimize individual age, fitness and complexity through symbolic regression, so as to generate potential functions that can predict the total energy of multiple carbon structures, and maintain high accuracy with a relatively low computational cost. Specifically, the paper uses a data set containing 863 unique carbon allotrope configurations, which are extracted from 858 carbon structures. Through the multi - objective evolutionary algorithm (using the NSGA - III selection mechanism), the potential functions generated by the paper can predict the total energy within an error range of ±7.70 eV, and have good generalization ability on multiple carbon structures. This shows that the genetic programming method can effectively discover and optimize the inter - atomic potential function, providing a new and efficient solution for molecular dynamics simulation in materials science.