Advancing Material Property Prediction: Using Physics-Informed Machine Learning Models for Viscosity

Mohammad Atif Faiz Afzal,Alex K. Chew,Matthew Sender,Zachary Kaplan,Anand Chandrasekaran,Andrea R. Browning,H. Shaun Kwak,Mathew D. Halls,Jackson Chief Elk
DOI: https://doi.org/10.26434/chemrxiv-2023-1qfw8-v2
2024-01-17
Abstract:In materials science, accurately computing properties like viscosity, melting point, and glass transition temperatures solely through physics-based models is challenging. Data-driven machine learning (ML) also poses challenges in constructing ML models, especially in the material science domain where data is limited. To address this, we integrate physics-informed descriptors from molecular dynamics (MD) simulations to enhance the accuracy and interpretability of ML models. Our current study focuses on accurately predicting viscosity in liquid systems using MD descriptors. In this work, we curated a comprehensive dataset of over 4,000 small organic molecules’ viscosities from scientific literature, publications, and online databases. This dataset enabled us to develop quantitative structure–property relationships (QSPR) consisting of descriptor-based and graph neural network models to predict temperature-dependent viscosities for a wide range of viscosities with considerable accuracy. The QSPR models reveal that including MD descriptors improves prediction accuracies of experimental viscosities, particularly at the small data set scale of fewer than a thousand data points. Furthermore, feature importance tools reveal that intermolecular interactions captured by MD descriptors are most important for accurate viscosity predictions. Finally, the QSPR models can accurately capture the inverse relationship between viscosity and temperature for six battery-relevant solvents, some of which were not included in the original data set. Our research highlights the effectiveness of incorporating MD descriptors into QSPR models, which leads to improved accuracy for properties that are difficult to predict when using physics-based models alone or when limited data is available.
Chemistry
What problem does this paper attempt to address?
This paper attempts to address the challenge of accurately predicting material properties (such as viscosity, melting point, and glass transition temperature) in materials science. Specifically, the authors focus on how to improve the prediction accuracy of liquid system viscosity by utilizing machine learning (ML) methods combined with physical information descriptors generated from molecular dynamics (MD) simulations. The main issues and solutions of the paper are as follows: ### Main Issues 1. **Limitations of Physical Models**: It is very difficult to accurately calculate material properties (such as viscosity) solely through physics-based models. 2. **Challenges of Data-Driven Machine Learning Models**: In the field of materials science, data is limited, making it challenging to build effective machine learning models. 3. **Improving Prediction Accuracy**: How to improve the prediction accuracy of machine learning models for properties such as viscosity with limited data. ### Solutions 1. **Dataset Construction**: The authors constructed a comprehensive dataset by collecting viscosity data of more than 4,000 small organic molecules from scientific literature, publications, and online databases. 2. **Feature Extraction**: Molecular features were characterized using RDKit, Morgan fingerprints, and Matminer descriptors, and physical information descriptors generated from molecular dynamics (MD) simulations were incorporated into the model. 3. **Model Development**: Descriptor-based quantitative structure-property relationship (QSPR) models and graph neural network (GNN) models were constructed to predict viscosity at different temperatures. 4. **Model Evaluation**: The performance of the models was evaluated through five-fold cross-validation (5-CV) and test set evaluation to ensure the generalization ability of the models on both training and test sets. 5. **Feature Importance Analysis**: The SHAP method was used to evaluate the impact of each feature on the model predictions, revealing the importance of intermolecular interactions in viscosity prediction. ### Main Contributions - **Improved Prediction Accuracy**: By introducing MD descriptors, the prediction accuracy of machine learning models was improved on small dataset scales (fewer than 1,000 data points). - **Model Interpretability**: Feature importance analysis indicated that intermolecular interactions are key factors affecting viscosity prediction. - **Temperature Dependence**: The model accurately captured the inverse relationship between viscosity and temperature, making it applicable to various battery-related solvents. In summary, this paper effectively improves the prediction accuracy of liquid system viscosity by combining physical information descriptors and machine learning methods, providing new tools and methods for research in the field of materials science.