A comprehensive workflow for allele-specific immune gene quantification and expression analysis in single-cell RNA-seq data

Ahmad Al Ajami,Jonas Schuck,Federico Marini,Katharina Imkeller
DOI: https://doi.org/10.1101/2024.12.10.627679
2024-12-15
Abstract:Motivation: Immune molecules such as B and T cell receptors, human leukocyte antigens (HLAs), or killer Ig-like receptors (KIRs) are encoded in the most genetically diverse loci of the human genome. Many of these immune genes exhibit remarkable allelic diversity across populations. While computational methods for HLA typing from bulk RNA sequencing data have emerged, streamlined solutions for allele-specific quantification in single-cell RNA sequencing (scRNA-seq) are lacking. Moreover, no standardized data structure or analytical framework has been established to handle allele-specific immune gene expression data at single-cell level. Results: We present a comprehensive workflow to (1) automate allele-typing and allele-specific expression quantification of HLA transcripts in scRNA-seq data using a Snakemake workflow, scIGD (single-cell ImmunoGenomic Diversity), and (2) represent and interactively explore immune gene expression at different annotation levels using a multi-layer data structure implemented as an R/Bioconductor software package, SingleCellAlleleExperiment. We validated our approach on a diverse spectrum of scRNA-seq datasets, and found that it performs consistently across different sequencing platforms and experimental setups. We illustrate how our method can be utilized to study loss of HLA expression in tumor cells or discover differential HLA allele expression in specific immune cell subtypes. By capturing such allele-specific expression patterns and their variation, our workflow offers novel insights into human immunogenomic diversity. Availability and implementation: scIGD is available under the MIT license at: https://github.com/AGImkeller/scIGD. SingleCellAlleleExperiment is available under the MIT license at: https://bioconductor.org/packages/SingleCellAlleleExperiment. scaeData provides validation datasets and is available under the MIT license at: https://bioconductor.org/packages/scaeData. Data processed with scIGD are available at: https://doi.org/10.5281/zenodo.14033960.
Biology
What problem does this paper attempt to address?