Leveraging Deep Chemical Language Processing for Co-crystal Prediction

Rıza Özçelik,Rebecca Birolo,Andrea Aramini,Roberto Gobetto,Michele Remo Chierotti,Francesca Grisoni
DOI: https://doi.org/10.26434/chemrxiv-2024-vgvhk
2024-05-29
Abstract:Most hits identified in the drug discovery pipelines and even 40% of marketed drugs suffer from suboptimal pharmacokinetic profiles. Co- crystallization, wherein a drug (or drug candidate) and another organic molecule form a multi- component crystal, can optimize physicochemical properties of those molecules without hampering their pharmacological activity. However, finding promising co-crystal pairs is resource-intensive due to the vast search space. Here we propose DeepCrystal, a deep learning model based on chemical language to predict co-crystallization. We rigorously validate DeepCrystal and find that it achieves 78% accuracy on realistic settings and displays superior performance to existing models. Leveraging the chemical language to represent molecules, DeepCrystal can estimate uncertainty in its predictions. We exploit this capability in a challenging prospective study and discover two novel co-crystal of diflunisal, an antiinflammatory drug. This prospective study exemplifies a successful application of deep learning to accelerate the co-crystallization process in the lab, highlighting its potential, in both academic and industrial settings.
Chemistry
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to improve the solubility of drugs through co-crystallization technology without altering their chemical structure or pharmacological activity. Specifically, the authors propose a deep learning model based on deep chemical language processing—DeepCrystal, to predict the co-crystallization likelihood between drug molecules and co-formers. Traditional co-crystal screening methods are time-consuming and resource-intensive, while DeepCrystal aims to accelerate this process through efficient prediction, thereby reducing experimental time and costs. ### Main Research Content: 1. **Background Introduction**: - Over 40% of marketed drugs and new chemical entities have low solubility issues. - Co-crystallization technology can improve drug solubility by introducing another organic molecule into the crystal structure, but finding the optimal co-former combination is a complex and time-consuming process. 2. **Proposed Method**: - **DeepCrystal Model**: A deep learning model based on chemical language, using SMILES strings to represent molecules, and learning the latent representation of molecules through convolutional neural networks (CNN), ultimately predicting the likelihood of co-crystallization through fully connected neural networks. - **SMILES Augmentation**: Enhancing the model's generalization ability by generating multiple different SMILES strings for the same molecule to balance the positive and negative sample ratio in the dataset. 3. **Model Validation**: - The performance of DeepCrystal was validated on internal and external test sets, showing a balanced accuracy of 78% on the external test set, outperforming other existing models. - Ablation studies demonstrated the improvement in model performance due to chemical language and SMILES augmentation. 4. **Uncertainty Estimation**: - Developed an uncertainty estimation method based on test-time SMILES augmentation, assessing the model's reliability by calculating the standard deviation of the predicted values. 5. **Prospective Study**: - DeepCrystal was used to screen four structurally similar co-former candidates, successfully predicting two new diflunisal co-crystals, with experimental results validating the model's accuracy. ### Conclusion: DeepCrystal significantly improves the accuracy and generalization ability of co-crystal prediction by adopting chemical language and SMILES augmentation techniques, providing an effective tool for drug co-crystal design and potentially accelerating the development process of new drug formulations.