DEL+ML paradigm for actionable hit discovery – a cross DEL and cross ML model assessment.

Sumaiya Iqbal,Wei Jiang,Eric Hansen,Tonia Aristotelous,Shuang Liu,Andrew Reidenbach,Cerise Raffier,Alison Leed,Chengkuan Chen,Lawrence Chung,Eric Sigel,Alex Burgin,Sandy Gould,Holly Soutter
DOI: https://doi.org/10.26434/chemrxiv-2024-2xrx4
2024-07-24
Abstract:DNA-Encoded Library (DEL) technology allows the screening of millions, or even billions, of encoded compounds in a pooled fashion which is faster and cheaper than traditional approaches. These massive amounts of data related to DEL binders and not-binders to the target of interest enable Machine Learning (ML) model development and screening of large, readily accessible, drug-like libraries in an ultra-high-throughput fashion. Here, we report a comparative assessment of the DEL+ML pipeline for hit discovery using three DELs and five ML models (fifteen DEL+ML combinations using two different feature representations). Each ML model was used to screen a diverse set of drug-like compound collections to identify orthosteric binders of two therapeutic targets, Casein kinase 1𝛼/δ (CK1𝛼/δ). Overall, 10% and 94% of the predicted binders and not-binders were confirmed in biophysical assays, including two nanomolar binders (187 and 69.6 nM affinity for CK1𝛼 and CK1δ, respectively). Our study provides insights into the DEL+ML paradigm for hit discovery: the importance of an ensemble ML approach in identifying a diverse set of confirmed binders, the usefulness of large training data and chemical diversity in the DEL, and the significance of model generalizability over accuracy. We shared our results via an open-source repository for further use and development of similar efforts.
Chemistry
What problem does this paper attempt to address?