A multi-platform normalization method for meta-analysis of gene expression data

Rachisan Djiake Tihagam,Sanchita Bhatnagar
DOI: https://doi.org/10.1016/j.ymeth.2023.06.012
IF: 4.647
Methods
Abstract:Transcriptomic profiling is a mainstay of translational cancer research and is often used to identify cancer subtypes, stratify responders vs. non-responders patients, predict survival, and identify potential targets for therapeutic intervention. Analysis of gene expression data gathered by RNA sequencing (RNA-seq) and microarray is generally the first step in identifying and characterizing cancer-associated molecular determinants. The methodological advancements and reduced costs associated with transcriptomic profiling have increased the number of publicly available gene expression profiles for cancer subtypes. Data integration from multiple datasets is routinely done to increase the number of samples, improve statistical power, and provide better insight into the heterogeneity of the biological determinant. However, utilizing raw data from multiple platforms, species, and sources introduces systematic variations due to noise, batch effects, and biases. As such, the integrated data is mathematically adjusted through normalization, which allows direct comparison of expression measures among studies while minimizing technical and systemic variations. This study applied meta-analysis to multiple independent Affymetrix microarray and Illumina RNA-seq datasets available through the Gene Expression Omnibus (GEO) and The Cancer Gene Atlas (TCGA). We have previously identified a tripartite motif containing 37 (TRIM37), a breast cancer oncogene, that drives tumorigenesis and metastasis in triple-negative breast cancer. In this article, we adapted and assessed the validity of Stouffer's z-score normalization method to interrogate TRIM37 expression across different cancer types using multiple large-scale datasets.
What problem does this paper attempt to address?