Less is More: Relative Rank is More Informative Than Absolute Abundance for Compositional NGS Data.

Xubin Zheng,Nana Jin,Qiong Wu,Ning Zhang,Haonan Wu,Yuanhao Wang,Rui Luo,Tao Liu,Wanfu Ding,Qingshan Geng,Lixin Cheng
DOI: https://doi.org/10.1093/bfgp/elae045
2024-01-01
Briefings in Functional Genomics
Abstract:High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.
What problem does this paper attempt to address?