Local environment-based machine learning for molecular adsorption energy prediction

Yifan Li,Yihan Wu,Yuhang Han,Qujie Lyu,Hao Wu,Xiuying Zhang,Lei Shen
2023-11-20
Abstract:Most machine learning (ML) models in Materials Science are developed by global geometric features, often falling short in describing localized characteristics, like molecular adsorption on materials. In this study, we introduce a local environment framework that extracts local features from crystal structures to portray the environment surrounding specific adsorption sites. Upon OC20 database (~20,000 3D entries), we apply our local environment framework on several ML models, such as random forest, convolutional neural network, and graph neural network. It is found that our framework achieves remarkable prediction accuracy in predicting molecular adsorption energy, significantly outperforming other examined global-environment-based models. Moreover, the employment of this framework reduces data requirements and augments computational speed, specifically for deep learning algorithms. Finally, we directly apply our Local Environment ResNet (LERN) on a small 2DMatPedia database (~2,000 2D entries), which also achieves highly accurate prediction, demonstrating the model transferability and remarkable data efficiency. Overall, the prediction accuracy, data-utilization efficiency, and transferability of our local-environment-based ML framework hold a promising high applicability across a broad molecular adsorption field, such as catalysis and sensor technologies.
Materials Science
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to improve the accuracy of adsorption energy prediction by proposing a new local environment description method. Specifically: 1. **Local Environment Description Method**: - A method based on local environment features is proposed to improve the input features of machine learning models. - The improved Voronoi tessellation technique is used to extract the geometric characteristics of 3D structures, and an innovative fingerprint method is combined to provide local information for adsorption sites. 2. **Improving Prediction Accuracy**: - The local environment description method significantly improves the prediction accuracy of hydrogen atom adsorption energy in catalytic processes. - The method performs excellently on both large-scale datasets (such as the OC20 database) and small-scale datasets (such as the 2DMatPedia database). 3. **Applicable to Various Models**: - Local environment features are applied to traditional machine learning algorithms (such as Random Forest) and deep learning algorithms (such as MolCGCNN and LERN). - It demonstrates superior performance on smaller datasets and reduces the need for computational resources. 4. **Efficiency and Portability**: - The LERN model performs well in handling outliers and maintains high accuracy with limited data samples. - The model trains faster than other neural network models and has good portability. In summary, this study proposes a novel local environment description method that not only improves the accuracy of adsorption energy prediction but also shows broad application potential across different materials and datasets.