A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Richard Van,Daniel Alvarez,Travis Mize,Sravani Gannavarapu,Lohitha Chintham Reddy,Fatma Nasoz,Mira V. Han
DOI: https://doi.org/10.1186/s12859-024-05801-x
IF: 3.307
2024-05-11
BMC Bioinformatics
Abstract:RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?