Optimized Crystallographic Graph Generation for Material Science

Astrid Klipfel,Yaël Frégier,Adlane Sayede,Zied Bouraoui
2023-06-07
Abstract:Graph neural networks are widely used in machine learning applied to chemistry, and in particular for material science discovery. For crystalline materials, however, generating graph-based representation from geometrical information for neural networks is not a trivial task. The periodicity of crystalline needs efficient implementations to be processed in real-time under a massively parallel environment. With the aim of training graph-based generative models of new material discovery, we propose an efficient tool to generate cutoff graphs and k-nearest-neighbours graphs of periodic structures within GPU optimization. We provide pyMatGraph a Pytorch-compatible framework to generate graphs in real-time during the training of neural network architecture. Our tool can update a graph of a structure, making generative models able to update the geometry and process the updated graph during the forward propagation on the GPU side. Our code is publicly available at <a class="link-external link-https" href="https://github.com/aklipf/mat-graph" rel="external noopener nofollow">this https URL</a>.
Materials Science,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve the problem of graph representation generation for crystalline materials, especially for its application in materials science discovery. Specifically, the paper focuses on how to efficiently generate graph representations suitable for neural networks from geometric information, especially for crystalline materials with periodic structures. Due to the periodic characteristics of crystalline materials, the generation of their graph representations not only requires efficient implementation methods to support real - time processing, but also needs to be able to run in a large - scale parallel environment. In addition, when training graph - based generative models, the model may update the geometry of the chemical structure during the forward propagation process, which requires that the graph representation can be dynamically updated to reflect the changes in the structure. Therefore, the paper proposes an efficient tool to generate cutoff graphs and k - nearest - neighbor graphs, and provides a PyTorch - compatible framework `pyMatGraph`, which can optimize the graph generation process on the GPU and support real - time graph generation when training neural network architectures. The main contributions of the paper are as follows: 1. **Efficient graph generation algorithm**: An improved version based on the KD - tree search algorithm is proposed, which is adapted to periodic structures and can quickly generate k - nearest - neighbor graphs and cutoff graphs. 2. **Data structure adapted to large - scale parallel environment**: A data structure suitable for the large - scale parallel environment of the GPU is designed to effectively track the k - nearest neighbors of each atom. 3. **Dynamic graph update**: Allows dynamic graph update during the forward propagation process, thus supporting the generative model to update the graph representation synchronously when updating the geometry. 4. **Performance evaluation**: The efficiency of this tool in processing large - scale data sets is verified through experiments, especially the significant acceleration effect on the GPU. These technological advances provide strong support for the discovery of new materials in materials science, especially in the development of new solar cell materials with specific band gaps, which helps to solve the problems of clean energy production and storage.