Abstract:Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

Testing the predictive power of reverse screening to infer drug targets, with the help of machine learning

Learning the Drug Target-Likeness of A Protein

Does Drug-Target Have A Likeness?

Machine learning prediction of oncology drug targets based on protein and network properties

Machine learning assisted hit prioritization for high throughput screening in drug discovery

Machine learning for target discovery in drug development

Drug Target Identification with Machine Learning: How to Choose Negative Examples

Analysis of protein features and machine learning algorithms for prediction of druggable proteins

In silico prediction of novel therapeutic targets using gene–disease association data

A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery

Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening

Artificial Intelligence/Machine Learning-Driven Small Molecule Repurposing via Off-Target Prediction and Transcriptomics

Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing.

Machine Learning Scoring Functions for Drug Discoveries from Experimental and Computer-Generated Protein-Ligand Structures: Towards Per-Target Scoring Functions

Attention-based approach to predict drug-target interactions across seven target superfamilies

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

Predicting Polypharmacology by Binding Site Similarity: From Kinases to the Protein Universe

Toward more realistic drug-target interaction predictions

Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction