Graph Neural Networks for Predicting Solubility in Diverse Solvents using MolMerger incorporating Solute-solvent Interactions

Vansh Ramani,Tarak Karmakar
2024-02-18
Abstract:Prediction of solubility has been a complex and challenging physiochemical problem that has tremendous implications in the chemical and pharmaceutical industry. Recent advancements in machine learning methods have provided great scope for predicting the reliable solubility of a large number of molecular systems. However, most of these methods rely on using physical properties obtained from experiments and or expensive quantum chemical calculations. Here, we developed a method that utilizes a graphical representation of solute-solvent interactions using `MolMerger', which captures the strongest polar interactions between molecules using Gasteiger charges and creates a graph incorporating the true nature of the system. Using these graphs as input, a neural network learns the correlation between the structural properties of a molecule in the form of node embedding and its physiochemical properties as output. This approach has been used to calculate molecular solubility by predicting the Log solubility values of various organic molecules and pharmaceuticals in diverse sets of solvents.
Disordered Systems and Neural Networks
What problem does this paper attempt to address?
This paper attempts to address the challenge of predicting molecular solubility in various solvents. Specifically: 1. **Research Background**: Solubility is an important physicochemical property with significant implications in fields such as materials science, environmental chemistry, the food industry, chemical process optimization, biotechnology, cosmetic formulation, agrochemicals, the oil and gas industry, and drug development. Accurately predicting the solubility of molecules in different solvents can significantly reduce the time and cost of early-stage drug development and help screen for candidate drugs with good solubility. 2. **Limitations of Existing Methods**: Most current machine learning methods rely on experimental data or expensive quantum chemical calculations to predict solubility. Additionally, these methods are usually limited to predicting solubility in aqueous solutions, with fewer predictions for non-aqueous solvents. 3. **Proposed New Method**: This paper proposes a method based on Graph Neural Networks (GNN) by introducing the "MolMerger" algorithm to explicitly capture solute-solvent interactions. This method uses Gasteiger charge calculations to determine the strongest polar interactions between solute and solvent molecules and creates a graph to represent the real system structure. Using these graphs as input, the neural network can learn the relationship between molecular structure and solubility, thereby accurately predicting the solubility of various organic molecules and drugs in different solvents. 4. **Objective**: The method aims to overcome the overfitting problem of existing methods to specific solvents and reliably predict molecular solubility in a wide range of solvents without relying on expensive experimental data. This makes the model applicable not only to aqueous solutions but also to organic solvents.