Abstract:Integrated analysis of human gene expression data from multiple studies has become essential in genomics research for complex traits. However, integrating data generated from different cohorts with different platforms, such as microarray and RNA-seq, often requires data preprocessing, including normalization. In this study, we empirically evaluate 9 commonly used cross-platform normalization methods. We classify these methods into two main types: joint and separate normalization. Joint methods normalize multiple datasets together, while separate methods normalize each dataset independently. We further divide these methods into unsupervised and supervised approaches depending on whether they use outcomes during their normalization process. Examples of joint unsupervised methods include Quantile Normalization (QN), while Rank-in serves as an example of a joint supervised method. Training Distribution Matching (TDM) is an example of a separate unsupervised method. We assess each method's ability to cluster samples, predict outcomes, and detect differentially expressed (DE) genes using three real datasets and simulated data. First, our real data analysis suggests that while joint supervised methods can cluster sample groups better than the other two method groups, they double use the outcome data with artificially inflated clustering performance. Their biases are further demonstrated by their inflated type I error in DE analysis and clustering results from simulated data with no DE genes. Second, for outcome prediction, supervised normalization is no longer applicable. Among the unsupervised methods, QN significantly outperforms the other approaches, regardless of whether RNA-seq data is used to predict microarray outcomes or vice versa. Finally, we compare normalization methods on downstream DE analysis using simulation. In addition to direct DE analysis on the combined normalized data using the non-parametric Wilcoxon rank-sum test, we also perform a meta-analysis that combines p-values of the DE analysis from each individual data. For DE analysis, the meta-analysis consistently achieves the best balance between controlling type I error and maximizing power for DE gene detection. Our research suggests that while normalization is critical for the integrated analysis of transcriptomics data, simple QN is the most efficient and unbiased normalization approach for outcome prediction, and meta-analysis is the most appropriate for DE analysis.

Effect of RNA-Seq data normalization on protein interactome mapping for Alzheimer's disease

A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks

Integration of genomic and transcriptomic layers in RNA‐Seq data leads to protein interaction modules with improved Alzheimer's disease associations

Evaluating Cross-Platform Normalization Methods for Integrated Microarray and RNA-seq Data Analysis

Large Diurnal Fluctuations in Intraocular Pressure Are an Independent Risk Factor in Patients With Glaucoma

Computational analysis of peripheral blood RNA sequencing data unravels disrupted immune patterns in Alzheimer's disease

Comprehensive transcript-level analysis reveals transcriptional reprogramming during the progression of Alzheimer's disease

Identification of microRNA-mRNA Regulatory Networks with Therapeutic Values in Alzheimer's Disease by Bioinformatics Analysis

Single-cell RNA sequencing analysis of human Alzheimer’s disease brain samples reveals neuronal and glial specific cells differential expression

Identification of candidate biomarkers and signaling pathways associated with Alzheimer's disease using bioinformatics analysis of next generation sequencing data

Alzheimer's Disease Protein Relevance Analysis Using Human and Mouse Model Proteomics Data

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Network Medicine Approach for Analysis of Alzheimer's Disease Gene Expression Data.

A public resource of single cell transcriptomes and multiscale networks from persons with and without Alzheimer’s disease

Deep proteomic network analysis of Alzheimer’s disease brain reveals alterations in RNA binding proteins and RNA splicing associated with disease

Characterizing dysregulations via cell-cell communications in Alzheimer's brains using single-cell transcriptomes

Integrated analysis of the lncRNA-associated ceRNA network in Alzheimer's disease

Psychological scaling of AIAW code of ethics for coaches.

MU-BRAIN: MUltiethnic Brain Rna-seq for Alzheimer INitiative

Network analysis of brain and bone tissue transcripts reveals shared molecular mechanisms underlying Alzheimer's Disease and related dementias (ADRD) and Osteoporosis

Molecular crosstalk between COVID-19 and Alzheimer's disease using microarray and RNA-seq datasets: A system biology approach