Abstract:In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic associations to disease.

Awake tracheal intubation through the laryngeal mask in neonates with upper airway obstruction

Identify Multiple Gene-Drug Common Modules via Constrained Graph Matching

Sedation and Paralysis

Identification and validation of differentially expressed genes for targeted therapy in NSCLC using integrated bioinformatics analysis

MNBDR: A Module Network Based Method for Drug Repositioning

Integration of multi-omics data to mine cancer-related gene modules

Identification of breast cancer risk modules via an integrated strategy

A Co-Module Approach for Elucidating Drug-Disease Associations and Revealing Their Molecular Basis.

Network-based estimation of therapeutic efficacy and adverse reaction potential for prioritisation of anti-cancer drug combinations

A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction

A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups

Network as a Biomarker: A Novel Network-Based Sparse Bayesian Machine for Pathway-Driven Drug Response Prediction

DM-MOGA: a multi-objective optimization genetic algorithm for identifying disease modules of non-small cell lung cancer

Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours

Systemic Similarity Analysis of Compatibility Drug-Induced Multiple Pathway Patterns in Vivo

Identification of functional gene modules by integrating multi-omics data and known molecular interactions

Identifying driver pathways based on a parameter-free model and a partheno-genetic algorithm

Bimodal gene expression in cancer patients provides interpretable biomarkers for drug sensitivity

ICDM-GEHC: identifying cancer driver module based on graph embedding and hierarchical clustering

Benchmarking network propagation methods for disease gene identification

drGAT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network