Abstract:Machine learning is rapidly advancing the drug discovery process, significantly enhancing speed and efficiency. Innovation in computer-aided drug design is primarily driven by structure- and ligand-based approaches. When the number of known inhibitors for a target is limited, data augmentation strategies are often preferred to enhance model performance. In this study, we developed predictive machine learning models for structure-based drug discovery leveraging multiple traditional machine learning algorithms trained with target and ligand dynamics-aware datasets. To illustrate our approach, we present a composite model that combines classification and regression to predict YTHDF1 inhibitors, utilizing PLEC features. YTHDF1, a key m6A reader protein involved in mRNA translation, is implicated in various cancers, making it a promising therapeutic target. Traditional structure-based virtual screening (SBVS) using generic scoring functions has struggled to identify potent YTHDF1 inhibitors due to the protein's unique binding characteristics. To overcome this, we developed YTHDF1-specific machine learning scoring functions (MLSFs) to enhance SBVS efficacy. We employed various data augmentation techniques to generate a comprehensive dataset, incorporating multiple conformations of ligands and the YTHDF1 protein. We have trained 64 YTHDF1-specific MLSFs using four machine learning algorithms and evaluated them on ten test sets, focusing on their predictive and ranking power. Our results demonstrate that the artificial neural network with protein-ligand extended connectivity fingerprints (ANN-PLEC) outperforms other MLSFs, consistently achieving high area under the precision-recall curve (PR-AUC) of 0.87. This method shows promise for targets with limited quantities of active molecules, providing a viable path forward for drug discovery research. The ANN-PLEC scoring function is made freely available on GitHub for other researchers to access and utilize https://github.com/JuniML/SBVS-YTHDF1/.

Beware of the Generic Machine Learning-Based Scoring Functions in Structure-Based Virtual Screening.

A Case-Based Meta-Learning Algorithm Boosts the Performance of Structure-Based Virtual Screening.

Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?

Optimization of Molecular Docking Scores with Support Vector Rank Regression

Recent progress on the prospective application of machine learning to structure-based virtual screening

From P100 to P100': A new citation‐rank approach

Improving Structure-Based Virtual Screening Performance Via Learning from Scoring Function Components

Can Machine Learning Consistently Improve the Scoring Power of Classical Scoring Functions? Insights into the Role of Machine Learning in Scoring Functions.

FSDscore: An Effective Target‐Focused Scoring Criterion for Virtual Screening

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 m6A reader protein

Machine‐learning scoring functions for structure‐based drug lead optimization

Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning

A note from the executive editor

A Generalized Protein-Ligand Scoring Framework with Balanced Scoring, Docking, Ranking and Screening Powers.

ASFP (artificial Intelligence Based Scoring Function Platform): a Web Server for the Development of Customized Scoring Functions

Combined strategies in structure-based virtual screening

Computational representations of protein–ligand interfaces for structure-based virtual screening

A Support Vector Machines Approach for Virtual Screening of Active Compounds of Single and Multiple Mechanisms from Large Libraries at an Improved Hit-Rate and Enrichment Factor.

[Radiation and combined therapy of esthesioneuroblastomas].

SVSBI: Sequence-based virtual screening of biomolecular interactions