A comprehensive study design reveals treatment-and transcript abundance – dependent concordance between RNA-seq and microarray data
Charles Wang,Binsheng Gong,Pierre R. Bushel,Jean Thierry-Mieg,Danielle,Thierry-Mieg,Joshua Xu,Hong Fang,Huixiao Hong,Jie Shen,Zhenqiang Su,Joe,Meehan,Xiaojin Li,Lu Yang,Haiqing Li,Paweł P. Łabaj,David P. Kreil,Dalila,Megherbi,Caiment Florian,Stan Gaj,Joost van Delft,Jos Kleinjans,Andreas,Scherer,Devanarayan Viswanath,Jian Wang,Yong Yang,Hui-Rong Qian,J. Lee,Lancashire,Marina Bessarabova,Yuri Nikolsky,Cesare Furlanello,Marco,Chierici,Davide Albanese,Giuseppe Jurman,Samantha Riccadonna,Michele,Filosi,Roberto Visintainer,Ke K. Zhang,Jianying Li,Jui-Hua Hsieh,L. Daniel,Svoboda,James C. Fuscoe,Youping Deng,Leming Shi,Richard S. Paules,Scott S. Auerbach,Weida Tong
2014-01-01
Abstract:RNA-seq facilitates unbiased genome-wide gene-expression profiling. However, its concordance with the well-established microarray platform must be rigorously assessed for confident uses in clinical and regulatory application. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same set of liver samples of rats under varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOA). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is highly correlated with treatment effect size, gene-expression abundance and the biological complexity of the MOA. RNA-seq outperforms microarray (90% versus 76%) in DEG verification by quantitative PCR and the main gain is its improved accuracy for low expressed genes. Nonetheless, predictive classifiers derived from both platforms performed similarly. Therefore, the endpoint studied and its biological complexity, transcript abundance, and intended application are important factors in transcriptomic research and for decision-making. Emerging technologies facilitate basic science research, but their value in clinical and regulatory settings requires rigorous assessment and consensus within the research community. The U.S. Food and Drug Administration’s (FDA’s) initiative on advancing regulatory science embraces collaborations among various stakeholders to expedite translation of advancement in basic science to regulatory application1. In the past decade, microarrays have been a principal technology for analyzing transcriptomes to support drug development and safety evaluation2. The FDA launched the community-wide MicroArray Quality Control (MAQC) consortium to investigate the reliability and utility of microarrays in identifying differentially expressed genes (DEGs) and predicting patient/toxicity outcomes based on gene-expression data in the first (MAQC-I)3, 4 and second (MAQCII)5, 6 phases of the project, respectively. MAQC-I and MAQC-II demonstrated the critical Wang et al. Page 2 Nat Biotechnol. Author manuscript; available in PMC 2014 November 25. N IH -P A A uhor M anscript N IH -P A A uhor M anscript N IH -P A A uhor M anscript roles of a comprehensive study design and crowd sourcing model to reach community-wide consensus on the fit-for-purpose use of emerging technologies. High-throughput sequencing technologies provide new methods for whole-transcriptome analyses of gene expression7. Recently published studies have compared data obtained from microarrays and RNA-seq in terms of technical reproducibility, variance structure, absolute expression and detection of DEGs or gene isoforms8–20 (Supplementary Table 1). Some of these studies suggested that RNA-seq exhibits lower precision for weakly expressed genes owing to the nature of sampling21, 22, whereas others found higher sensitivity of RNA-seq for gene detection23, 24. The varied conclusions can be attributed to the fact that they used few treatment conditions and hence they do not cover a wide range of biologic complexity. Furthermore, the question has not been adequately addressed about whether predicting toxicity outcomes based on gene-expression data could be enhanced with RNA-seq over microarray. Under the umbrella of the third phase of the MAQC consortium3–6, also known as the SEquencing Quality Control (SEQC) project, we conducted a comprehensive study to evaluate RNA-seq in its differences and similarities to microarrays in terms of identifying DEGs and developing predictive models. In contrast to data generated as part of the SEQC project using reference RNA samples25, our study design provides a comparison of the transcription response for rat livers that each platform detects in terms of extensive chemical treatments, biologic replication and breath of shared mode of action (MOA) of the chemicals beyond simply monitoring performance metrics. Specifically, we report the results of a comparative analysis of gene expression responses profiled by Affymetrix microarray and Illumina RNA-seq in liver tissue from rats exposed to diverse chemicals. We used either microarray or RNA-seq data to generate DEGs and predictive models of MOA of each chemical. This allowed us to assess the influence of the chemical (referred hereafter as the ‘treatment effect’) on the concordance between RNA-seq and microarrays and on the performance of predictive models generated using each technology. Treatment effect is characterized by the number of DEGs and the overexpressed pathways underlying MOA of the chemical. We found that (i) the concordance between array and sequencing platforms for detecting the number of DEGs was positively correlated with the extensive perturbation elicited by the treatment, (ii) RNA-seq performed better than microarrays at detecting weakly expressed genes, and (iii) gene expression–based predictive models generated from RNA-seq and microarray data were similar. The experimental design also allowed us to identify positive correlations in differentially expressed RNA elements (mRNA, splice variants, non-coding RNA and exon-exon junction) with the extensive perturbation elicited by the treatment, and to examine treatment-induced alternative splicing and shortening of 3’ untranslated regions (UTRs).