SifiNet: A robust and accurate method to identify feature gene sets and annotate cells

Qi Gao,Zhicheng Ji,Liuyang Wang,Kouros Owzar,Qi-Jing Li,Cliburn Chan,Jichun Xie
DOI: https://doi.org/10.1101/2023.05.24.541352
2024-04-06
Abstract:SifiNet is a robust and accurate computational pipeline for identifying distinct gene sets, extracting and annotating cellular subpopulations, and elucidating intrinsic relationships among these subpopulations. Uniquely, SifiNet bypasses the cell clustering stage, commonly integrated into other cellular annotation pipelines, thereby circumventing potential inaccuracies in clustering that may compromise subsequent analyses. Consequently, SifiNet has demonstrated superior performance in multiple experimental datasets compared with other state-of-the-art methods. SifiNet can analyze both single-cell RNA and ATAC sequencing data, thereby rendering comprehensive multiomic cellular profiles. It is conveniently available as an open-source R package.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address several key issues in single-cell sequencing data analysis, particularly the identification of feature gene sets and the annotation of cell subpopulations. Specifically: 1. **Identification of Feature Gene Sets**: Existing methods typically use a two-step approach (clustering followed by differential expression analysis) or identify feature gene sets by detecting highly variable genes. However, these methods are less accurate when the data has complex or subtle heterogeneity, as inaccuracies in the initial clustering step can lead to errors in subsequent feature gene identification. 2. **Cell Subpopulation Annotation**: Existing methods rely on clustering results, which can lead to inaccurate annotations. Therefore, a method that does not depend on clustering is needed for more accurate cell annotation. To address these issues, the paper proposes a new method called SifiNet, which has the following features: - **No Clustering Required**: SifiNet directly identifies feature gene sets based on the topological structure of the gene co-expression network, thereby avoiding potential errors introduced by clustering. - **Support for Multi-Omics Data**: SifiNet can handle single-cell RNA sequencing and single-cell ATAC sequencing data, providing comprehensive multi-omics cell phenotype analysis. - **High Accuracy and Robustness**: Validated by multiple experimental datasets, SifiNet excels in identifying feature gene sets and improving cell annotation accuracy, especially in cases of complex cell heterogeneity. In summary, SifiNet aims to provide a more accurate, efficient, and robust method for single-cell data analysis, overcoming the limitations of existing methods and revealing the intrinsic relationships between cell subpopulations.