Abstract:Abstract Motivation In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene–drug association patterns and biological context may not be obvious. Results We present a procedure to compare cell lines based on their gene–drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene–drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene–drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene–drug associations. In the pharmacogenomic datasets CTRP2, GDSC2 and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches. Availability and implementation Bipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartite. The datasets were derived from sources in the public domain, which are the Cancer Cell Line Encyclopedia (https://portals.broadinstitute.org/ccle), the Cancer Therapeutics Response Portal (https://portals.broadinstitute.org/ctrp.v2.1/?page=#ctd2BodyHome), and the Genomics of Drug Sensitivity in Cancer (https://www.cancerrxgene.org/). These datasets can be downloaded using the PharmacoGx R package (https://bioconductor.org/packages/release/bioc/html/PharmacoGx.html). Supplementary information Supplementary data are available at Bioinformatics online.

Statistically Controlled Identification of Differentially Expressed Genes in One-to-one Cell Line Comparisons of the CMAP Database for Drug Repositioning

A Comprehensive Evaluation of Connectivity Methods for L1000 Data.

A Rank-Based Algorithm of Differential Expression Analysis for Small Cell Line Data with Statistical Control.

Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity

Connection Map for Compounds (CMC): A Server for Combinatorial Drug Toxicity and Efficacy Analysis.

Identification of Reproducible Drug-Resistance-related Dysregulated Genes in Small-Scale Cancer Cell Line Experiments

Identification of population-level differentially expressed genes in one-phenotype data

Sscmap: an Extensible Java Application for Connecting Small-Molecule Drugs Using Gene-Expression Signatures

Identifying Differentially Expressed Genes from Cross-Site Integrated Data Based on Relative Expression Orderings.

Revisiting Connectivity Map from a Gene Co-Expression Network Analysis

Identification and validation of differentially expressed genes for targeted therapy in NSCLC using integrated bioinformatics analysis

Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing.

EVALUATION OF ANALYTICAL METHODS FOR CONNECTIVITY MAP DATA

A Review of Drug Repositioning Based Chemical-induced Cell Line Expression Data

A simple and robust method for connecting small-molecule drugs using gene-expression signatures

Bipartite graph-based approach for clustering of cell lines by gene expression–drug response associations

Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods

Abstract P58: Utilizing Cancer Vulnerabilities and Dependencies to Explore Cancer Biomarkers by Triangulating Large-Scale Gene Knockout and Drug Response Data

Tumor relapse-free survival prognosis related consistency between cancer tissue and adjacent normal tissue in drug repurposing for solid tumor via connectivity map

DPADM: a Novel Algorithm for Detecting Drug-Pathway Associations Based on High-Throughput Transcriptional Response to Compounds.

Identifying Anti-Cancer Drug Response Related Genes Using an Integrative Analysis of Transcriptomic and Genomic Variations with Cell Line-Based Drug Perturbations.