Effect of RNA-Seq data normalization on protein interactome mapping for Alzheimer's disease

DÜZ Elif,Tunahan ÇAKIR,Elif DÜZ
DOI: https://doi.org/10.1016/j.compbiolchem.2024.108028
IF: 3.737
2024-02-22
Computational Biology and Chemistry
Abstract:High throughput RNA sequencing brings new perspective to the elucidation of molecular mechanisms of diseases. Normalization is the first and most important step for RNA-Seq data, and it can differ based on the purpose of the analysis. Within-sample normalization methods (eg. TPM) are preferred when genes in a sample are compared with each other, and between-sample normalization methods (eg. deseq2, TMM, Voom) are used when the samples in a dataset are compared. Normalization approaches rescale the data, and, therefore, they affect the results of the analysis. Here, we selected two most commonly used Alzheimer's disease RNA-Seq datasets from ROSMAP and Mayo Clinic cohorts and mapped the differentially expressed genes on human protein interactome to discover disease-specific subnetworks. To this end, the raw count data were first processed with four different, commonly used RNA-Seq normalization methods (deseq2, TMM, Voom and TPM). Then, covariate adjustment was applied to the normalized data for gender, age of death and post-mortem interval. Each normalized dataset was separately mapped on the human protein-protein interaction network either in covariate-adjusted or non-adjusted form. Capturing known Alzheimer's disease genes and genes associated with the disease-related functional terms in the discovered subnetworks were the criteria to compare different normalization methods. Based on our results, applying covariate adjustment has a positive effect on normalization by removing the confounder effects. Covariate-adjusted TMM and covariate-adjusted deseq2 methods performed better in both transcriptome datasets.
biology,computer science, interdisciplinary applications
What problem does this paper attempt to address?