Abstract:Hydration free energy (HFE) of molecules is a fundamental property having impor- tance throughout chemistry and biology. Calculation of the HFE can be challenging and expensive with classical molecular dynamics simulation-based approaches. Ma- chine learning (ML) models are increasingly being used to predict HFE. Although the accuracy of ML models for datasets for small molecules is impressive, these models suffer from lack of interpretability. In this work, we have developed a physics-based ML model with only six descriptors, which is both accurate and fully interpretable, and applied it to a database for small molecule HFE, FreeSolv. We have evaluated the electrostatic energy by an approximate closed form of the Generalized Born (GB) model and polar surface area. In addition, we have logP and hydrogen bond acceptor and donors as descriptors along with the number of rotatable bonds. We have used different ML models such as random forest and extreme gradient boosting. The best result from these models has a mean absolute error of only 0.74 kcal/mol. The main power of this model is that the descriptors have clear physical meaning and it was found that the descriptor describing the electrostatics and the polar surface area, followed by the hydrogen bond donors and acceptors, are the most important factors for the calculation of hydration free energy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to accurately predict the hydration free energy (HFE) of small molecules through machine - learning models using a small number of physically interpretable descriptors?** ### Specific problem background: 1. **Importance of hydration free energy**: - Hydration free energy (HFE) is a fundamental property in chemistry and biology and is crucial for understanding the behavior of molecules in solvents. - Traditional methods based on classical molecular dynamics simulations for calculating HFE are both complex and expensive. 2. **Limitations of existing methods**: - Although machine - learning (ML) models perform well in predicting HFE, these models often lack interpretability, making it difficult to understand their working principles or the reasons for errors. ### Goals of the paper: - Develop a physics - based machine - learning model to predict the HFE of small molecules using as few descriptors as possible, ensuring that the model is not only accurate but also fully interpretable. - Use the FreeSolv database for verification, which contains 643 small organic molecules and their experimentally measured HFE values. ### Main contributions: - **Descriptor selection**: Only six descriptors with clear physical meanings are used, including polar surface area, number of hydrogen - bond donors and acceptors, logP, number of rotatable bonds, and a charge term (the sum of the GB term and the Coulomb electrostatic term). - **Model performance**: Different machine - learning models such as Random Forest, XGBoost, Gradient Boosting, and LightGBM are used for training, and the mean absolute error (MAE) of the best result is only 0.74 kcal/mol. - **Interpretability**: Since the descriptors have clear physical meanings, the prediction results of the model can be clearly explained, especially the effects of the charge term and the polar surface area on HFE are the most significant. ### Summary: By introducing a physics - based machine - learning model, the paper has successfully solved the problems of complexity and high cost of traditional methods and the lack of interpretability of existing machine - learning models, providing an efficient and easy - to - understand HFE prediction tool for fields such as drug design.

Physics-based Machine Learning to Predict Hydration Free Energies for Small Molecules with a minimal number of descriptors: Interpretable and Accurate

Machine Learning Prediction of Hydration Free Energy with Physically Inspired Descriptors.

Enhancing Accuracy and Feature Insights in Hydration Free Energy Predictions for Small Molecules with Machine Learning

Computing hydration free energies of small molecules with first principles accuracy

MLSolv-A: A Novel Machine Learning-Based Prediction of Solvation Free Energies from Pairwise Atomistic Interactions

Prediction of Hydration energies of Adsorbates at Pt(111) and Liquid Water Interfaces using Machine Learning

Machine learning model for non-equilibrium structures and energies of simple molecules

Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences

Multitask Deep Ensemble Prediction of Molecular Energetics in Solution: From Quantum Mechanics to Experimental Properties

Alchemical prediction of hydration free energies for SAMPL

MolE8: Finding DFT Potential Energy Surface Minima Values from Force-Field Optimised Organic Molecules with New Machine Learning Representations

Accurate Free Energy Calculation via Multiscale Simulations Driven by Hybrid Machine Learning and Molecular Mechanics Potentials

Predicting hydration free energies of the FreeSolv database of druglike molecules with molecular density functional theory

Predicting solvation free energies with an implicit solvent machine learning potential

Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations

Predicting Molecular Energies of Small Organic Molecules with Multifidelity Methods

Machine-learning free-energy functionals using density profiles from simulations

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning

Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset.

On-the-fly Prediction of Protein Hydration Densities and Free Energies using Deep Learning

Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials