Enabling Open Machine Learning of DNA Encoded Library Selections to Accelerate the Discovery of Small Molecule Protein Binders
Rafael Couñago,James Wellnitz,Shabbir Ahmad,Nabin Begale,Jermiah Joseph,Hong Zeng,Albina Bolotokova,Aiping Dong,Shaghayegh Reza,Pegah Ghiabi,Gibson Elisa,Xuemin Cheng,Guiping Tu,Xianyang Li,Jian Liu,Dengfeng Dou,Rachel J. Harding,Aled M. Edwards,Benjamin Haibe-Kains,Levon Halabelian,Alexander Tropsha,Jin Li
DOI: https://doi.org/10.26434/chemrxiv-2024-xd385
2024-10-18
Abstract:Recent advances in DNA-encoded library (DEL) screening have created bioactivity datasets containing billions of molecules, unlocking new opportunities for machine learning (ML) in drug discovery. However, most ultra-large DEL libraries are proprietary, limiting the advancement of ML tools for big chemical data analytics and hindering the democratization of DEL-ML technology. We address this gap by developing an open, end-to-end DEL-ML framework using public datasets, where enriched binders are represented by common chemical fingerprints, ensuring proprietary data protection. We demonstrate that ML models can be built and validated on fingerprinted DEL data and then applied to virtual screening (VS) of billion-sized, publicly accessible chemical libraries. As a proof-of-concept, we screened the human protein WDR91 using the HitGen OpenDEL library (3 billion molecules) and trained ML models, which were used to screen the Enamine REAL Space library (37 billion molecules). Fifty potential binders were identified, 48 of which were tested, and seven were confirmed as novel binders with dissociation constants (KD) from 2.7 to 21 μM that were successfully co-crystalized with WDR91. This fully automated, open-source workflow demonstrates the potential of DEL-ML models in discovering novel binders and promotes the use of open chemical bioactivity datasets and ML to accelerate drug discovery.
Chemistry