Abstract:The T-cell receptor (TCR) population in humans is comprised of highly diversified heterodimers, regulating the recognition of antigen-major histocompatibility complex. Tremendous TCR sequence diversity is produced by somatic recombination of several TCR gene loci each consisting of multiple gene segments. Next-generation sequencing has enabled comprehensive profiling of the TCR repertoire from different physiological and disease conditions ushering in much interest in using TCR-seq to assess T-cell diversity. However, during NGS library construction and sequencing, errors and enzymatic inefficiencies can compromise the accuracy of the final data, particularly around calling of the VD and VDJ recombined regions and subsequent clonotype assignment. To increase the accuracy of NGS sequencing, Unique Molecular Identifiers (UMIs), consisting of short random nucleotide bases, can be used to mark original molecules in NGS library allowing for error and bias corrections. There are two well studied technical limitations to apply UMIs: 1.) UMI sequences tend to collide when input molecule number is large 2.) UMI sequences are not insulated from PCR and sequencing errors. To address these limitations, many computational approaches had been published. Among them, very few can be used to solve UMI colliding errors and over-simplified error models were implemented for UMI sequencing error handling. Here we report a novel strategy and UMI structure which uses more complex UMIs that is longer and of different length. This results in minimizing UMI collision while maximizing sequencing quality. Our UMI analysis pipeline, "UMI-nea" is able to handle not only substitution errors but also indel errors and UMIs with different lengths. We developed a novel computational framework to parallelly process sequence comparisons to mitigate the elevated computational burden. To account for the varied dispersion of PCR efficiency for different molecules and error bearing UMIs from libraries with different input and with different sequencing depth, we also developed a statistical framework leveraging negative binomial model and single-cell knee plot to set a dynamic threshold for original molecule estimate. We verified UMI-nea with several simulated data and demonstrated that UMI-nea can achieve >99% completeness and homogeneity to recover the original molecule count with various error rates and UMI lengths, outperforming existing tools and methods in comparison. We applied UMI-nea to profile TCR for 8 PBMC samples sequenced on different Illumina platforms with different sequencing depths. We observed >85% reproducibility of clonotype calls on all samples. To test the sensitivity and specificity of UMI-nea, we sequenced pure cell line samples and cell line spike-in samples with different ratios and discovered very high recall and precision rates. Citation Format: Jixin Deng, Jingxiao Zhang, Song Tian, Samuel J. Rulli, Hong Xu, John DiCarlo, Eric Lader. UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 7425.

ATAC-seq with Unique Molecular Identifiers Improves Quantification and Footprinting.

Identification of transcription factor binding sites using ATAC-seq

Abstract 7425: UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules

ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide

Highly sensitive single-cell chromatin accessibility assay and transcriptome coassay with METATAC

txci-ATAC-seq: a massive-scale single-cell technique to profile chromatin accessibility

Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers

On the identification of differentially-active transcription factors from ATAC-seq data

ATAC-pipe: General Analysis of Genome-Wide Chromatin Accessibility

UMI-tools: Modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

ATAC-seq and Its Applications in Complex Disease

Fundamental and Practical Approaches for Single-Cell ATAC-seq Analysis

Comprehensive Understanding of Tn5 Insertion Preference Improves Transcription Regulatory Element Identification.

Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data

Adipocyte-Specific ATAC-Seq with Adipose Tissues Using Fluorescence-Activated Nucleus Sorting

SnapATAC: A Comprehensive Analysis Package for Single Cell ATAC-seq

Protocol for single-cell ATAC sequencing using combinatorial indexing in mouse lung adenocarcinoma

scifi-ATAC-seq: massive-scale single-cell chromatin accessibility sequencing using combinatorial fluidic indexing

Atlas-scale single-cell chromatin accessibility using nanowell-based combinatorial indexing

Transposase-assisted Tagmentation of RNA/DNA Hybrid Duplexes