Abstract:Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

Machine learning assisted hit prioritization for high throughput screening in drug discovery

Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery

Machine Learning-Driven Data Valuation for Optimizing High-Throughput Screening Pipelines

Data Valuation: A novel approach for analyzing high throughput screen data using machine learning

High-throughput mechanistic screening of non-equilibrium inhibitors by a fully automated data analysis pipeline in early drug-discovery

Testing the predictive power of reverse screening to infer drug targets, with the help of machine learning

A deep-learning based analysis framework for ultra-high throughput screening time-series data

Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening

Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening

Hit me with your best shot: Integrated hit discovery for the next generation of drug targets

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

Unleashing high content screening in hit detection - Benchmarking AI workflows including novelty detection

Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

Combating small molecule aggregation with machine learning

Deconvoluting Low Yield from Weak Potency in Direct-to-Biology Workflows with Machine Learning

Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI

Machine learning prediction of oncology drug targets based on protein and network properties

Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery

ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery

Accelerating high-throughput virtual screening through molecular pool-based active learning

ChemPrint: An AI-Driven Framework for Enhanced Drug Discovery