Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery

Davide Boldini,Lukas Friedrich,Daniel Kuhn,Stephan A. Sieber
DOI: https://doi.org/10.1021/acscentsci.3c01517
IF: 18.2
2024-03-15
ACS Central Science
Abstract:Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.
chemistry, multidisciplinary
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of false positive readings generated during the high-throughput screening (HTS) process in drug discovery. Specifically, the authors propose a new machine learning-assisted method—Minimum Variance Sampling Analysis (MVS-A)—to simultaneously identify false positive compounds and prioritize genuinely bioactive molecules. #### Main Contributions: 1. **Method Innovation**: MVS-A is a novel sample influence formula based on the Gradient Boosting Model (GBM) that can distinguish compounds with expected biological responses from those causing detection interference. 2. **No Prior Information Required**: This method does not rely on prior screening or detection mechanisms, making it applicable to any high-throughput screening activity. 3. **Efficiency**: MVS-A processes large HTS datasets in less than 30 seconds on low-resource hardware. 4. **Performance Evaluation**: Testing on multiple public and industrial datasets demonstrates MVS-A's effectiveness in eliminating false positives and prioritizing truly active compounds. 5. **Case Study**: The practical effectiveness of MVS-A is showcased through simulations of real drug discovery activities. #### Research Background: - High-throughput screening (HTS) plays a crucial role in drug discovery but suffers from the issue of false positive readings. - False positives arise from various causes, including colloidal aggregation and autofluorescence. - Current methods mostly rely on specific interference mechanism assumptions or historical data, limiting their applicability. #### Method Overview: 1. **Train GBM Classifier**: Train on HTS datasets to distinguish active compounds from inactive ones. 2. **Calculate Sample Influence**: Use MVS-A to calculate the influence score of each active compound. 3. **Ranking and Classification**: Rank all HTS results based on MVS-A scores to identify false positives and true positives. #### Experimental Results: - MVS-A outperforms existing rule-based baselines and other data-driven methods in multiple benchmark tests. - Case studies show that MVS-A effectively identifies compounds with higher biological relevance and does not favor chemical groups that are less amenable to further pharmacological optimization. In summary, this paper aims to enhance the efficiency and accuracy of the high-throughput screening process through the MVS-A method, thereby accelerating the drug discovery process.