Abstract:Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

Accurate Prediction of Potential Druggable Proteins Based on Genetic Algorithm and Bagging-SVM Ensemble Classifier

DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins

Learning the Drug Target-Likeness of A Protein

Analysis of protein features and machine learning algorithms for prediction of druggable proteins

Predicting Protein-Ligand Interactions Based on Bow-Pharmacological Space and Bayesian Additive Regression Trees

Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness.

Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines.

XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set

An ensemble method for predicting and designing of druggable proteins.

Prediction of druggable proteins using machine learning and functional enrichment analysis: a focus on cancer-related proteins and RNA-binding proteins

Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach

Prediction of Potential Drug Targets Based on Simple Sequence Properties

In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences

Drug-target affinity prediction method based on consistent expression of heterogeneous data

A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network

PINNED: identifying characteristics of druggable human proteins using an interpretable neural network

Identification of Human Protein Drug Targets Homologues with Data Mining

Prediction of Protein-Protein Interaction Sites Using an Ensemble Method

Machine learning prediction of oncology drug targets based on protein and network properties

Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses

Drug-Target Affinity Prediction Based on Improved GraphDTA