Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

Said Moshawih,Zhen Hui Bu,Hui Poh Goh,Nurolaini Kifli,Lam Hong Lee,Khang Wen Goh,Long Chiau Ming

DOI: https://doi.org/10.1186/s13321-024-00855-8

2024-05-30

Journal of Cheminformatics

Abstract:In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula "w_new", consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC 50 values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R 2 values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.

chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems

What problem does this paper attempt to address?

The main objective of this paper is to propose a new consensus integrated virtual screening method to improve the efficiency and accuracy of compound screening in the drug discovery process. Specifically, the research team developed a machine learning model workflow that combines multiple traditional screening methods (such as quantitative structure-activity relationship (QSAR), pharmacophore modeling, docking, and 2D shape similarity) and introduced a new evaluation metric "w_new" to refine the ranking of these models. Below is a summary of the key issues addressed in this study: 1. **Developing an integrated virtual screening workflow**: The study proposed a novel workflow that integrates different virtual screening methods (including QSAR, pharmacophore analysis, molecular docking, and 2D shape similarity) through machine learning models to enhance the ability to identify potential active compounds. 2. **Introducing a new evaluation metric**: "w_new" is an innovative evaluation metric that assesses and optimizes machine learning models by comprehensively considering statistical measures such as the coefficient of determination (\(R^2\)), mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE) on the training and validation sets. 3. **Optimizing model performance**: By using the "w_new" formula, researchers were able to determine the best-performing machine learning model for each protein target and use this model to score compounds. 4. **Improving screening effectiveness**: By comparing single screening methods, the study demonstrated that the consensus scoring method achieved better results on specific protein targets (e.g., PPARG and DPP4), as evidenced by higher area under the curve (AUC) values. 5. **External validation**: The study also evaluated the predictive performance and generalization ability of the models through validation on external datasets. In summary, this study aims to improve the accuracy and efficiency of compound screening in the drug discovery process by combining multiple virtual screening techniques and employing advanced machine learning methods.

Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

A Case-Based Meta-Learning Algorithm Boosts the Performance of Structure-Based Virtual Screening.

Consensus scoring criteria for improving enrichment in virtual screening

Streamlining Computational Fragment-Based Drug Discovery Through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening

Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening

Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening

MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery

How Does Consensus Scoring Work for Virtual Library Screening? an Idealized Computer Experiment.

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets

Recent progress on the prospective application of machine learning to structure-based virtual screening

ESSENCE-Dock: A Consensus-Based Approach to Enhance Virtual Screening Enrichment in Drug Discovery

Novel Big Data-Driven Machine Learning Models for Drug Discovery Application

Machine learning accelerates pharmacophore-based virtual screening of MAO inhibitors

DockM8: An All-in-One Open-Source Platform for Consensus Virtual Screening in Drug Design

Consensus models for CDK5 inhibitors in silico and their application to inhibitor discovery

Deep Learning in Virtual Screening: Recent Applications and Developments

Establishing the foundations for a data-centric AI approach for virtual drug screening through a systematic assessment of the properties of chemical data

Virtual Screening on Natural Products for Discovering Active Compounds and Target Information

Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.