Abstract:Single-nucleus joint ATAC- and RNA-sequencing (snMultiome) can be used to identify functionally divergent cell subpopulations based on their transcriptomic and epigenetic profiles within complex samples. Accurate cell type annotation is critical to successful snMultiome data analysis. Several computational methods have been developed for automatic annotation. Traditional cell type annotation methods initially cluster cells using unsupervised learning methods based on the gene expression profiles, then label the clusters using aggregated cluster-level expression profiles and marker genes. These methods rely heavily on the clustering results. As the purity of clusters cannot be guaranteed, false detection of cluster features may lead to incorrect annotations. Further, canonical cell surface markers may not always be suitable to be applied in single-nucleus RNA-seq studies because single-nucleus RNA-seq generally yields lower detected transcript numbers compared to typical single-cell RNA-seq. Moreover, cell type marker genes in the snRNA-seq data may differ from the ones obtained with scRNA-seq data, reflecting biological differences in the cytoplasmic and nuclear RNA pools. Lastly, the data obtained from malignant cells are best left out in establishing cell type reference data because they are too heterogeneous and patient-specific. Reference-based automated algorithms such as SingleR enable quick and unbiased classifications by leveraging a collection of built-in reference data sets for human (e.g. Human Primary Cell Atlas (microarray-based) and the combined Blueprint Epigenomics and Encode data set (RNA-seq-based)). Still, SingleR may return erroneous cell type classifications. Our dataset was generated using the 10x Genomics snMultiome platform to yield 296,557 nuclei from 82 frozen breast tumors, representing patients from diverse genetic ancestral background. Using these data, we sought to improve the accuracy of cell type annotation by SingleR. To achieve this, we first separated malignant and non-malignant cells based on DNA copy number aberrations (aneuploidy) through CopyKAT. For cells determined to be non-malignant, we built the custom reference from snRNA-seq data set, recently made available by The Human Breast Cell Atlas, and then applied singleR with a custom reference where each cell type is represented by single-cells of that type, allowing a well-founded estimate of the confidence with which a cell type call can be made. Using this approach, we successfully identified 11 distinct cell types for non-malignant cells, including fibroblast, adipocyte, pericyte, basal, luminal-secretory, luminal-HR, myeloid, mast, vascular, lymphatic, and T-cells, which can then be further subclassified. Furthermore, we interrogated each cluster using known canonical markers and transferred the cell type labels to snATAC-seq. This approach enabled us to link peaks to genes in each cell type. We believe this new approach that refines SingleR can greatly improve accuracy and minimize misclassification when annotating cell types in breast tumors using snMultiome data. Citation Format: Huaitian Liu, Alexandra Harris, Brittany Jenkins-Lord, Tiffany H. Dorsey, Francis Makokha, Shahin Sayed, Gretchen Gierach, Stefan Ambs. Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl) nr LB240.

Abstract 878: Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation

Abstract LB240: Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors

Abstract 3520: A scalable single cell RNA-seq pipeline leveraging machine learning and high-quality references for cell-type prediction

Abstract 5095: Statistical Modeling of Transcriptional Regulatory States in Single-Cell RNA-Seq Data of Tumor and Infiltrated Immune Cells

Abstract 7352: ScanCT: A tree-based machine learning model to detect single-cell genomic features associated with clinical outcomes

Abstract 2411: A deep learning-based multimodal ensemble algorithm for lung cancer early detection with cross-ethnic generalizability

Abstract 188: Deep learning enables label-free profiling of the tumor microenvironment and enrichment of rare cancer cells

Abstract 2310: Integrating real-world histopathological and clinicogenomic data from 1799 lung cancer patients by applying unsupervised deep learning

Abstract 5721: Automated annotation for large-scale clinicogenomic models of lung cancer treatment response and overall survival

Abstract 6126: Understanding disparities in lung cancer using single cell RNA sequencing data transformed by the Gerchberg Saxton algorithm

Abstract 899: Development of a deep learning model for cell type mapping in colorectal cancer using H&E images leveraging image-based spatial transcriptomics data

Abstract 909: Enhancing genomic analysis in cancer diagnostics: A machine learning approach for removing artifacts in FFPE specimens

Abstract 7394: Delineating spatial expression signatures of lung adenocarcinoma subtypes from spatial single-cell transcriptomics using graph neural networks

Abstract LB243: Deep learning-based molecular characterization of lung cancers from never smokers using hematoxylin and eosin-stained whole slide images

Abstract 2942: Deciphering histological subtype-associated cellular and molecular characteristics of lung adenocarcinoma using single-cell RNA sequencing and spatial transcriptomics

Abstract 6371: Deep learning algorithm for multi-cancer detection and classification using cf-WGS

Abstract 5381: A broad-use deep learning model based on multi-dimensional morphology to identify and characterize tumor cell heterogeneity

Abstract 5131: Deep learning-based tumor microenvironment cell types mapping from H&E images of lung adenocarcinoma using spatial transcriptomic data

scCancer2: data-driven in-depth annotations of the tumor microenvironment at single-level resolution

A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types

Abstract 4970: Multi-modal machine learning approaches for predicting cancer type and Gleason grade leveraging public TCGA data