Imbalance and Composition Correction Ensemble Learning Framework (ICCELF): A novel framework for automated scRNA-seq cell type annotation

Saishi Cui,Sina Nassiri,Issa Zakeri
DOI: https://doi.org/10.1101/2024.04.21.590442
2024-04-26
Abstract:Single-cell RNA sequencing (scRNA-seq) has gained broad utility and success in revealing novel biological insight in preclinical and clinical investigations. Cell type annotation remains a key analysis task with great influence on downstream interpretation of scRNA-seq data. Traditional machine learning approaches proposed for automated cell type annotation often overlook the inherent imbalance of cell type proportions within biological samples, and the compositional nature of sequencing-based gene expression quantification. In this study, we highlight the importance of accounting for cell type imbalance and compositionality of sequencing count data, and introduce the Imbalance and Composition Corrected Ensemble Learning Framework (ICCELF) as a novel approach to automated cell type annotation. We show via comprehensive evaluation on both simulated and real-world scRNA-seq data that by effectively addressing class imbalance and data compositionality. ICCELF offers a robust and efficient solution that facilitates accurate and reliable cell type annotation, paving the way for enhanced biological discoveries.
Bioinformatics
What problem does this paper attempt to address?