Molecular fingerprint-based machine learning assisted QSAR model development for prediction of ionic liquid properties

Yi Ding,Minchun Chen,Chao Guo,Peng Zhang,Jingwen Wang
DOI: https://doi.org/10.1016/j.molliq.2020.115212
IF: 6
2021-03-01
Journal of Molecular Liquids
Abstract:<p>Ionic liquids (ILs) have many applications in, for example, organic synthesis, batteries and drug delivery. In this study, molecular fingerprint (MF) was used to represent ionic liquids (ILs) and was combined with machine learning (ML) to develop quantitative structure-activity relationship (QSAR) models for predicting the refractive index and viscosity of ILs. To demonstrate the effectiveness of this approach, four datasets with different sizes containing different numbers of ILs' refractive indexes and viscosity, which were previously used to develop QSAR models by molecular descriptor (MD)-based method and group contribution method (GCM), were employed to develop QSAR models by MF-ML method. The results showed that the models developed by MF-ML showed comparative predictive performance with the MD-based method and GCM for these four datasets, but MF-ML can more quickly obtain the representations of IL within milliseconds. Moreover, the MF-ML models were interpreted by the recently developed shapely additive explanation (SHAP) method. The results showed that the models made the predictions based on the reasonable understanding of how different features affect the related properties of IL, thus building the trustworthiness of MF-ML models. This study offered a new approach with theoretical support to rapidly developing trustful QSAR models to predict the properties of ILs</p>
chemistry, physical,physics, atomic, molecular & chemical
What problem does this paper attempt to address?
The problem this paper attempts to address is the development of a fast and reliable method to predict the refractive index and viscosity of ionic liquids (ILs). Specifically, the researchers use a combination of molecular fingerprints (MF) and machine learning (ML) to establish quantitative structure-activity relationship (QSAR) models, as an alternative to traditional methods based on molecular descriptors (MD) or group contribution methods (GCM). Although these traditional methods can achieve high prediction accuracy on specific datasets, they have drawbacks such as high computational cost and long processing time. By using molecular fingerprints, the researchers hope to obtain representations of ionic liquids in a short time and combine information on conditions such as temperature and pressure to improve the efficiency and accuracy of predictions. ### Main Objectives of the Paper: 1. **Develop efficient prediction models**: Use molecular fingerprints and machine learning methods to quickly generate representations of ionic liquids and establish QSAR models to predict their refractive index and viscosity. 2. **Compare the performance of different methods**: Compare the model based on molecular fingerprints with traditional models based on molecular descriptors and group contribution methods to verify the effectiveness and accuracy of the new method. 3. **Explain the prediction mechanism of the model**: Use the SHAP method to explain the model's prediction results, showing how different features (such as temperature, pressure, atomic groups) affect the properties of ionic liquids, enhancing the interpretability and credibility of the model. ### Research Background: - **Applications of ionic liquids**: Ionic liquids have wide applications in fields such as organic synthesis, batteries, and drug delivery, but experimental screening for ionic liquids with specific properties is very time-consuming and expensive. - **Limitations of traditional methods**: Methods based on molecular descriptors and group contribution methods are effective but have high computational costs when dealing with large-scale datasets and limited predictive ability for new ionic liquids. - **Advantages of molecular fingerprints**: Molecular fingerprints are binary vectors that quickly generate chemical structure features, are easy to obtain, and have low computational costs, making them suitable for handling large-scale datasets. ### Research Methods: - **Datasets**: Use four datasets of different scales, containing refractive index and viscosity data for different numbers of ionic liquids. - **Model development**: Use the molecular fingerprints of ionic liquids as input, combined with information on conditions such as temperature and pressure, and train QSAR models using the XGBoost algorithm. - **Model evaluation**: Evaluate the prediction performance of the models using root mean square error (RMSE) and coefficient of determination (R²). - **Model interpretation**: Use the SHAP method to explain the prediction mechanism of the models, showing the impact of different features on the prediction results. ### Main Findings: - **Prediction performance**: The XGBoost model based on molecular fingerprints shows performance comparable to traditional methods in predicting the refractive index and viscosity of ionic liquids, with faster computation speed. - **Model interpretation**: The SHAP method reveals how the model makes predictions based on features such as temperature, pressure, and atomic groups, demonstrating the model's rationality and credibility. ### Conclusion: This study demonstrates that QSAR models based on molecular fingerprints and machine learning are an effective method for accurately predicting the refractive index and viscosity of ionic liquids in a short time. This method not only improves prediction efficiency but also enhances the interpretability and credibility of the model through the SHAP method, and is expected to be widely applied in the prediction of other properties of ionic liquids.