Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights

Qiang Su,Yi Long,Deming Gou,Junmin Quan,Qizhou Lian
DOI: https://doi.org/10.1093/bib/bbae532
IF: 9.5
2024-10-21
Briefings in Bioinformatics
Abstract:We introduce a groundbreaking approach: the minimum free energy–based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k -mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k -mer distribution against the real, observed sequencing data characterized by nonuniform k -mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters—mean and SD—derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k -mer abundances across MFE categories, enabling simultaneous correction of biases at the single k -mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?