Leming Shi,Wendell D Jones,Roderick V Jensen,Stephen C Harris,Roger G Perkins,Federico M Goodsaid,Lei Guo,Lisa J Croner,Cecilie Boysen,Hong Fang,Feng Qian,Shashi Amur,Wenjun Bao,Catalin C Barbacioru,Vincent Bertholet,Xiaoxi Megan Cao,Tzu-Ming Chu,Patrick J Collins,Xiao-hui Fan,Felix W Frueh,James C Fuscoe,Xu Guo,Jing Han,Damir Herman,Huixiao Hong,Ernest S Kawasaki,Quan-Zhen Li,Yuling Luo,Yunqing Ma,Nan Mei,Ron L Peterson,Raj K Puri,Richard Shippy,Zhenqiang Su,Yongming Andrew Sun,Hongmei Sun,Brett Thorn,Yaron Turpaz,Charles Wang,Sue Jane Wang,Janet A Warrington,James C Willey,Jie Wu,Qian Xie,Liang Zhang,Lu Zhang,Sheng Zhong,Russell D Wolfinger,Weida Tong

Abstract:Background Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. Results Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance ( P ) derived from widely used simple t -tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P -value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P -value ranking is an expected mathematical consequence of the high variability of the t -values; the more stringent the P -value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. Conclusion We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P -value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P -value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.

Considering Dependencies Amongst Genes Helps to Adjust the Significance Rank of DEGs

Evaluating Reproducibility of Differential Expression Discoveries in Microarray Studies by Considering Correlated Molecular Changes.

Apparently low reproducibility of true differential expression discoveries in microarray studies.

The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Approaches and Practical Considerations for the Analysis of Toxicogenomics Data

Evaluating Reproducibility of Differential Expression Genes Based on Protein-Protein Interactions

Investigation of reproducibility of differentially expressed genes in DNA microarrays through statistical simulation

Identifying reproducible cancer-associated highly expressed genes with important functional significances using multiple datasets

Investigating the Concordance of Gene Ontology Terms Reveals the Intra- and Inter-Platform Reproducibility of Enrichment Analysis

Identifying Differentially Expressed Genes from Cross-Site Integrated Data Based on Relative Expression Orderings.

A two-step strategy for detecting differential gene expression in cDNA microarray data

A Rank-Based Algorithm of Differential Expression Analysis for Small Cell Line Data with Statistical Control.

A Network-Based Method to Evaluate Quality of Reproducibility of Differential Expression in Cancer Genomics Studies.

Identifying Differentially Expressed Genes Based on Differentially Expressed Edges.

Identification of Reproducible Drug-Resistance-related Dysregulated Genes in Small-Scale Cancer Cell Line Experiments

Ranking analysis for identifying differentially expressed genes.

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

A New Approach To Identify Differentially Expressed Genes By Integrating Cancer Microarray And Sage Data

A Two-Way Rectification Method for Identifying Differentially Expressed Genes by Maximizing the Co-Function Relationship.

Prioritization of Differentially Expressed Genes Through Integrating Public Expression Data