GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type Identification

Cecily Wolfe,Yayi Feng,David Chen,Edwin Purcell,Anne Talkington,Sepideh Dolatshahi,Heman Shakeri
DOI: https://doi.org/10.48550/arXiv.2205.01187
IF: 4.31
2022-05-02
Genomics
Abstract:The cellular composition of the tumor microenvironment can directly impact cancer progression and the efficacy of therapeutics. Understanding immune cell activity, the body's natural defense mechanism, in the vicinity of cancerous cells is essential for developing beneficial treatments. Single cell RNA sequencing (scRNA-seq) enables the examination of gene expression on an individual cell basis, providing crucial information regarding both the disturbances in cell functioning caused by cancer and cell-cell communication in the tumor microenvironment. This novel technique generates large amounts of data, which require proper processing. Various tools exist to facilitate this processing but need to be organized to standardize the workflow from data wrangling to visualization, cell type identification, and analysis of changes in cellular activity, both from the standpoint of malignant cells and immune stromal cells that eliminate them. We aimed to develop a standardized pipeline (GeoTyper, https://github.com/celineyayifeng/GeoTyper) that integrates multiple scRNA-seq tools for processing raw sequence data extracted from NCBI GEO, visualization of results, statistical analysis, and cell type identification. This pipeline leverages existing tools, such as Cellranger from 10X Genomics, Alevin, and Seurat, to cluster cells and identify cell types based on gene expression profiles. We successfully tested and validated the pipeline on several publicly available scRNA-seq datasets, resulting in clusters corresponding to distinct cell types. By determining the cell types and their respective frequencies in the tumor microenvironment across multiple cancers, this workflow will help quantify changes in gene expression related to cell-cell communication and identify possible therapeutic targets.
What problem does this paper attempt to address?