Abstract:Introduction: We present a novel method to identify cancer driver genes that jointly examines any number of diverse transcriptomic alterations with the goal to uncover highly recurrent and heterogeneous patterns in 1190 samples across 26 cancer types as part of the PanCancer Analysis of Whole Genomes (PCAWG) of the International Cancer Genome Consortium (ICGC). Motivation: Previous pan-cancer genomic studies have focused on the analysis of somatic mutations as the driver of phenotypic changes. Here, we propose a method to integrate a wide variety of RNA and DNA changes to redefine the concept of driver events and account for the transcriptome’s role in tumorigenesis. PTK2 provides a motivating example, since it has many RNA alterations that correlate with patient survival, such as overexpression, exon-skips, and alternative promoter usage. In our analysis, we integrate an unprecedented amount of various alterations including gene fusions, RNA editing, alternative splicing, expression outliers, alternative promoters, allele specific expression, and somatic mutations. This enables us to also identify mutually exclusive (MutE) and co-occurring (CoO) patterns between different types of alterations within a gene. Methods: Our method has 3 main strengths: flexibility to handle any number or type of alteration, sensitivity to different frequencies of alterations so rare events are not lost in the recurrence analysis, and diversity of ranking such that genes with multiple alterations are prioritized. Our method is summarized in two steps: 1) Identify genes that are both recurrently and heterogeneously altered across many samples by calculating a rank-based score for each gene. 2) Identify MutE and CoO patterns between alteration types for the genes identified in the previous step. To ensure that alterations were comparable, we applied a thresholding model to binarize all alterations for gene-sample pairs, allowing us to account for the properties of the different modalities involved. Step 1 of our method calculates a score for each gene that takes into account: 1) the number of alterations to a gene across all samples, 2) the rarity of each alteration, and 3) how many types of alterations are observed per gene. The score is then used to rank the genes and top genes are considered for MutE and CoO analyses. Results: Our top 100 ranked genes were highly enriched for cancer census genes (adjusted p-value: 2.06e-9), indicating that we identify cancer relevant genes. Our top five ranked cancer census genes were IGF2, ERBB2, RARA, CREBBP, and ARID1A; all of which had at least 4 of 7 possible alterations, showing our scoring method prioritizes genes with diverse alterations. We also found that alternative promoter usage and alternative splicing were highly co-occurring alterations, with PTK2 having the highest co-occurrence between them. In summary, we propose a new method to analyze various RNA disruptions and show it can yield new insights beyond genomic variation. Citation Format: Natalie R. Davidson, PanCancer Analysis of Whole Genomes 3 (PCAWG-3) for ICGC, Alvis Brazma, Angela N. Brooks, Claudia Calabrese, Nuno A. Fonseca, Jonathan Goke, Yao He, Xueda Hu, Andre Kahles, Kjong-Van Lehmann, Fenglin Liu, Gunnar Ratsch, Siliang Li, Roland F. Schwarz, Mingyu Yang, Zemin Zhang, Fan Zhang, Liangtao Zheng. Integrating diverse transcriptomic alterations to identify cancer-relevant genes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 389. doi:10.1158/1538-7445.AM2017-389

Identification of cancer genes using a statistical framework for multi-experiment analysis of non-discretized array CGH data (vol 36, pg 13, 2008)

Genome-wide analysis of DNA copy-number changes using cDNA microarrays

Copy Number Aberrations from Affymetrix SNP 6.0 Genotyping Data-How Accurate Are Commonly Used Prediction Approaches?

Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes

Combined cDNA array comparative genomic hybridization and serial analysis of gene expression analysis of breast tumor progression

High-Resolution Genomic And Expression Analyses Of Copy Number Alterations In Breast Tumors

DrGaP: a powerful tool for identifying driver genes and pathways in cancer sequencing studies.

Functional Genomic Analysis of Chromosomal Aberrations in a Compendium of 8000 Cancer Genomes

The landscape of somatic copy-number alteration across human cancers

Cross-species Comparison of Acgh Data from Mouse and Human BRCA1- and BRCA2-mutated Breast Cancers

Genome-Wide Identification of Somatic Aberrations from Paired Normal-Tumor Samples

Array Comparative Genome Hybridization for Tumor Classification and Gene Discovery in Mouse Models of Malignant Melanoma.

Cancer Sample Analysis Utilizing Single-Nucleotide Polymorphism Array and Array Comparative Genomic Hybridization

Copy Number Analysis Of Whole-Genome Data Using Bic-Seq2 And Its Application To Detection Of Cancer Susceptibility Variants

Progenetix: 12 years of oncogenomic data curation

CGHTRIMMER: Discretizing noisy Array CGH Data

De-biased sparse canonical correlation for identifying cancer-related trans-regulated genes

Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis—An application in renal clear cell carcinoma

Identifying somatic fingerprints of cancers defined by germline and environmental risk factors

The Genomic Landscapes of Human Breast and Colorectal Cancers

Integrating Diverse Transcriptomic Alterations To Identify Cancer-Relevant Genes