Deconvoluting Low Yield from Weak Potency in Direct-to-Biology Workflows with Machine Learning

William McCorkindale,Mihajlo Filep,Nir London,Alpha A. Lee,Emma King-Smith
DOI: https://doi.org/10.26434/chemrxiv-2023-b8wmh-v2
2024-02-09
Abstract:High throughput and rapid biological evaluation of small molecules is an essential factor in drug discovery and development. Direct-to-Biology (D2B), whereby compound purification is foregone, has emerged as a viable technique in time efficient screening, specifically for PROTAC design and biological evaluation. However, one notable limitation is the prerequisite of high yielding reactions to ensure the desired compound is indeed the compound responsible for biological activity. Herein, we report a machine learning based yield-assay deconfounder capable of deconvoluting low yield from low potency to identify false negatives. We validated this approach by identifying promising SARS-CoV-2 main protease inhibitors with nanomolar activity that rivaled potency observed from the standard D2B workflow. Furthermore, we show how our framework can be utilized in a broad, in silico screen with to produce compounds of similar potency as a D2B assay.
Chemistry
What problem does this paper attempt to address?
This paper aims to address the problem of misjudging active compounds as inactive due to low yield in the Direct-to-Biology (D2B) workflow. D2B is a rapid screening method for small molecule bioactivity, but requires high yield reactions to ensure the accuracy of active compounds. The paper proposes a machine learning-based yield-activity decoupling approach that can distinguish between low yield and low potency, thereby identifying false negative compounds. This approach predicts the bioactivity of compounds using two different machine learning models (random forests and Gaussian processes), and is able to identify potential active molecules even in low yield crude reaction mixtures. Experimental validation targeting the SARS-CoV-2 main protease (Mpro) inhibitors demonstrates that this approach can discover potential drugs with nanomolar activity and can be applied to large-scale computer-aided screening, improving the efficiency and scope of high-throughput D2B screening.