Self-supervised Representations and Node Embedding Graph Neural Networks for Accurate and Multi-scale Analysis of Materials

Jian-Gang Kong,Ke-Lin Zhao,Jian Li,Qing-Xu Li,Yu Liu,Rui Zhang,Jia-Ji Zhu,Kai Chang
2024-06-05
Abstract:Supervised machine learning algorithms, such as graph neural networks (GNN), have successfully predicted material properties. However, the superior performance of GNN usually relies on end-to-end learning on large material datasets, which may lose the physical insight of multi-scale information about materials. And the process of labeling data consumes many resources and inevitably introduces errors, which constrains the accuracy of prediction. We propose to train the GNN model by self-supervised learning on the node and edge information of the crystal graph. Compared with the popular manually constructed material descriptors, the self-supervised atomic representation can reach better prediction performance on material properties. Furthermore, it may provide physical insights by tuning the range information. Applying the self-supervised atomic representation on the magnetic moment datasets, we show how they can extract rules and information from the magnetic materials. To incorporate rich physical information into the GNN model, we develop the node embedding graph neural networks (NEGNN) framework and show significant improvements in the prediction performance. The self-supervised material representation and the NEGNN framework may investigate in-depth information from materials and can be applied to small datasets with increased prediction accuracy.
Materials Science,Computational Physics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in materials science: 1. **Large-scale Data Dependency**: Existing supervised learning methods, such as Graph Neural Networks (GNN), typically rely on large-scale datasets for end-to-end learning when predicting material properties. This not only consumes significant resources but may also lose physical insights into the multi-scale information of materials. 2. **High Cost and Error-prone Data Annotation**: The process of annotating data requires expensive experimental or computational costs and inevitably introduces errors, limiting the accuracy of predictions. 3. **Limitations of Manually Constructed Descriptors**: Although manually constructed material descriptors perform well in some cases, their length increases rapidly with the number of elements in the dataset or the number of atoms in the unit cell, limiting their application in more diverse material datasets. Additionally, these descriptors typically encode only nearest-neighbor information, lacking larger-scale information. ### Solutions To address the above issues, the authors propose the following solutions: 1. **Self-supervised Learning Strategy**: Training the GNN model through self-supervised learning on node and edge information of crystal graphs. Compared to popular manually constructed material descriptors, self-supervised atomic representations can achieve better predictive performance and provide physical insights by adjusting the range of information. 2. **Node Embedding Graph Neural Network (NEGNN) Framework**: To incorporate rich physical information into the GNN model, the authors developed the NEGNN framework, significantly improving predictive performance. Self-supervised material representations and the NEGNN framework can deeply mine information from materials and improve prediction accuracy on small datasets. ### Specific Implementation - **Self-supervised Learning Task**: The authors designed a prediction task that randomly masks a certain proportion of node and edge information in the crystal graph, enabling the model to recover the masked information based on the surrounding environment. In this way, the model gradually learns the rules of material chemical composition and captures high-level information of the local structure. - **Multi-scale Atomic Representation**: By connecting single-scale descriptors from different GNN layers, multi-scale descriptors are constructed, effectively alleviating the over-smoothing problem of deep GNNs and integrating information from different spatial scales, better capturing complex interactions in materials. - **Experimental Validation**: The authors generated an experimental magnetic moment dataset, demonstrating the superior performance of self-supervised atomic representations in predicting local magnetic moments of solid materials. They further analyzed the rich information of atomic representations using the t-SNE visualization method. ### Experimental Results - **Magnetic Moment Prediction**: Self-supervised atomic representations performed excellently in predicting magnetic moments, especially when combined with manually constructed descriptors (such as OFM), significantly enhancing predictive performance. - **Advantages of Multi-scale Representations**: Multi-scale atomic representations showed stronger robustness and lower prediction errors in predicting the magnetic moments of transition metals and lanthanides, particularly when dealing with the higher complexity of lanthanides. - **Other Material Properties**: The authors also applied self-supervised material representations to predict formation energy, band gap, and elastic properties. The results showed that multi-scale descriptors outperformed manually constructed descriptors in predicting various material properties. ### Conclusion Through the self-supervised learning strategy and the NEGNN framework, the authors successfully addressed some key issues in materials science, providing new methods and tools for material design and property prediction.