Examining Generalizability of AI Models for Catalysis

Kamal Choudhary,Shih-Han Wang,Hongliang Xin,Luke Achenie
DOI: https://doi.org/10.26434/chemrxiv-2024-hj733
2024-09-25
Abstract:In this work, we investigate the generalizability of problem-specific machine-learning models for catalysis across different datasets and adsorbates, and examine the potential of unified models as pre-screening tools for density functional theory calculations. We develop graph neural network models for 12 different datasets for catalysis and then cross-evaluate their performance. Unified models include ALIGNN-FF, MATGL, CHGNet, and MACE. Pearson correlation coefficient analysis indicates that generalizability improves when similar adsorbates are used for training and testing or when a larger database is employed for training. Results demonstrate that while the accuracy of the unified models has room for improvement, their excellent performance in predicting the trend of adsorption energies can be a valuable pre-screening tool for selecting potential candidates prior to resource-intensive DFT calculations in catalyst design, thereby reducing computational expenses. The tools used in this work will be made available at: \url{https://github.com/usnistgov/catalysismat}.
Chemistry
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Generalization ability of specific problem models**: Investigates whether machine learning models trained for specific catalytic problems can be effectively applied to different datasets or adsorbates. 2. **Accuracy of unified models**: Explores whether unified models trained on large-scale datasets (such as ALIGNN-FF, MATGL, CHGNet, and MACE) can accurately predict adsorption energies on smaller datasets. 3. **Feasibility of unified models as screening tools**: Verifies whether unified models can replace the generation of specific problem datasets, thereby serving as a pre-screening tool before density functional theory (DFT) calculations to reduce computational costs. Through these studies, the paper aims to answer the following key questions: - How do specific problem models perform on different datasets? - Can unified models be used to predict adsorption energies for specific small-scale datasets? - Can unified models effectively serve as pre-screening tools before DFT calculations in catalyst design?