: an R Package for Single-Cell Annotation with CellMarker2.0

Wei Cui
DOI: https://doi.org/10.1101/2024.09.14.609619
2024-09-16
Abstract:Single-cell RNA sequencing (scRNA-seq) allows researchers to study biological activities at the cellular level, enabling the discovery of new cell types and the analysis of intercellular interactions. However, annotating cell types in scRNA-seq data is a crucial and time-consuming process, with its quality significantly influencing downstream analyses. Accurate identification of potential cell types provides valuable insights for discovering new cell populations or identifying novel markers for known cells, which may be utilized in future research. While various methods exist for single-cell annotation, one of the most common approaches is to use known cell markers. The CellMarker2.0 database, a human-curated repository of cell markers extracted from published articles, is widely used for this purpose. However, it currently offers only a web-based tool for usage, which can be inconvenient when integrating with workflows like Seurat. To address this limitation, we introduce , an R package designed to streamline single-cell annotation using the CellMarker2.0 database in conjunction with Seurat. provides a suite of functions for querying the CellMarker2.0 database locally, offering insights into potential cell types for each cluster. In addition to single-cell annotation, the package also supports various bioinformatics workflows, including RNA-seq analysis, making it a versatile tool for transcriptomic research.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to annotate cell types efficiently and accurately in single - cell RNA sequencing (scRNA - seq) data. Specifically, the paper points out that although there are multiple single - cell annotation methods, annotation using known cell markers is a common and effective method. However, the widely - used CellMarker2.0 database currently only provides a web - based tool, which is very inconvenient when integrating with analysis pipelines such as Seurat. To overcome this limitation, the authors introduce an R package named **easybio**, which aims to simplify the single - cell annotation process in the following ways: 1. **Local query of the CellMarker2.0 database**: Users can directly query cell markers in the CellMarker2.0 database in the R environment without relying on web - based tools. 2. **Automated annotation process**: easybio can automatically match highly - expressed genes in each cell cluster to potential cell types, thereby accelerating the annotation process and reducing manual errors. 3. **Flexibility**: Users can specify the number of top - ranked genes used for matching to optimize the specificity and sensitivity of the annotation. 4. **Support for multiple bioinformatics workflows**: In addition to single - cell annotation, easybio also supports multiple bioinformatics tasks such as RNA - seq analysis. Through these functions, easybio aims to improve the efficiency and accuracy of single - cell annotation while providing a more flexible and user - friendly tool to help researchers understand cell heterogeneity more deeply and discover new cell populations.