Abstract:Cross-platform normalization seeks to minimize technological bias between microarray and RNAseq whole-transcriptome data. Incorporating multiple gene expression platforms permits external validation of experimental findings, and augments training sets for machine learning models. Here, we compare the performance of Feature Specific Quantile Normalization (FSQN) to a previously used but unvalidated and uncharacterized method we label as Feature Specific Mean Variance Normalization (FSMVN). We evaluate the performance of these methods for bidirectional normalization in the context of nested feature selection.

What problem does this paper attempt to address?

The paper primarily explores the performance of two methods for cross-platform (microarray and RNA sequencing) gene expression data normalization—Feature-Specific Quantile Normalization (FSQN) and Feature-Specific Mean-Variance Normalization (FSMVN)—in supervised machine learning classification tasks. ### Research Background and Objectives - **Research Background**: In molecular classification, using gene expression data for disease research, treatment, and classification is a powerful framework. In cancer research, molecular classification helps in understanding tumor heterogeneity, disease mechanisms, progression, and prognosis. However, comparing data between different technological platforms (such as microarray and RNA sequencing) presents the issue of technical bias. - **Research Objectives**: - Compare the performance of FSQN and FSMVN in bidirectional normalization (i.e., microarray to RNA sequencing or RNA sequencing to microarray) and evaluate the effectiveness of these methods under feature selection techniques. - Verify whether FSQN and FSMVN can maintain equivalent classification performance during the bidirectional normalization process and whether this performance is affected by feature selection. ### Main Findings - **Elimination of Batch Effects**: FSQN and FSMVN can effectively eliminate batch effects between data from different technological platforms. - **Classification Performance**: Without using feature selection, FSQN and FSMVN provided clinically equivalent bidirectional model performance comparable to internal platform distribution. Even under optimal feature selection conditions, FSQN and FSMVN exhibited balanced accuracy comparable to internal platform distribution performance. - **Impact of Feature Selection**: When using feature selection, FSQN and FSMVN still maintained good performance, and as the number of selected genes decreased, the performance of these two methods remained close to the scenario of using single-platform data. ### Conclusion - FSQN and FSMVN are equally effective in generating supervised machine learning classifiers for molecular subtype classification. - Under optimal modeling conditions, the model accuracy on cross-platform normalized data using these two methods is comparable to that of single-platform data. - Caution is still needed when using cross-platform data, as specific performance differences may depend on the classification problem, training, and testing distributions, among other factors.

Feature-specific quantile normalization and feature-specific mean–variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data

Evaluating Cross-Platform Normalization Methods for Integrated Microarray and RNA-seq Data Analysis

[A case of Guillain-Barré syndrome associated with bilateral ballism: an overlap between Fisher's syndrome and Guillain-Barré syndrome].

Feature selection followed by a novel residuals-based normalization simplifies and improves single-cell gene expression analysis

Selecting Reliable Mrna Expression Measurements Across Platforms Improves Downstream Analysis

Improving the Diversity of Captured Full-Length Isoforms Using a Normalized Single-Molecule RNA-sequencing Method

Role of the general hospital in community psychiatry in Taiwan.

Normalization of Single-cell RNA-seq Data Using Partial Least Squares with Adaptive Fuzzy Weight

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Non-linear Normalization for Non-UMI Single Cell RNA-Seq

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

Modified least-variant set normalization for miRNA microarray.

Performance evaluation of transcriptomics data normalization for survival risk prediction

Development of a robust and generalizable algorithm "gQuant" for accurate normalizer gene selection in qRT-PCR analysis

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

A non-centromeric C band variant on chromosome 11q23.2.

A scaling normalization method for differential expression analysis of RNA-seq data