Abstract:The T-cell receptor (TCR) population in humans is comprised of highly diversified heterodimers, regulating the recognition of antigen-major histocompatibility complex. Tremendous TCR sequence diversity is produced by somatic recombination of several TCR gene loci each consisting of multiple gene segments. Next-generation sequencing has enabled comprehensive profiling of the TCR repertoire from different physiological and disease conditions ushering in much interest in using TCR-seq to assess T-cell diversity. However, during NGS library construction and sequencing, errors and enzymatic inefficiencies can compromise the accuracy of the final data, particularly around calling of the VD and VDJ recombined regions and subsequent clonotype assignment. To increase the accuracy of NGS sequencing, Unique Molecular Identifiers (UMIs), consisting of short random nucleotide bases, can be used to mark original molecules in NGS library allowing for error and bias corrections. There are two well studied technical limitations to apply UMIs: 1.) UMI sequences tend to collide when input molecule number is large 2.) UMI sequences are not insulated from PCR and sequencing errors. To address these limitations, many computational approaches had been published. Among them, very few can be used to solve UMI colliding errors and over-simplified error models were implemented for UMI sequencing error handling. Here we report a novel strategy and UMI structure which uses more complex UMIs that is longer and of different length. This results in minimizing UMI collision while maximizing sequencing quality. Our UMI analysis pipeline, "UMI-nea" is able to handle not only substitution errors but also indel errors and UMIs with different lengths. We developed a novel computational framework to parallelly process sequence comparisons to mitigate the elevated computational burden. To account for the varied dispersion of PCR efficiency for different molecules and error bearing UMIs from libraries with different input and with different sequencing depth, we also developed a statistical framework leveraging negative binomial model and single-cell knee plot to set a dynamic threshold for original molecule estimate. We verified UMI-nea with several simulated data and demonstrated that UMI-nea can achieve >99% completeness and homogeneity to recover the original molecule count with various error rates and UMI lengths, outperforming existing tools and methods in comparison. We applied UMI-nea to profile TCR for 8 PBMC samples sequenced on different Illumina platforms with different sequencing depths. We observed >85% reproducibility of clonotype calls on all samples. To test the sensitivity and specificity of UMI-nea, we sequenced pure cell line samples and cell line spike-in samples with different ratios and discovered very high recall and precision rates. Citation Format: Jixin Deng, Jingxiao Zhang, Song Tian, Samuel J. Rulli, Hong Xu, John DiCarlo, Eric Lader. UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 7425.

Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline

Optimizing Accuracy and Efficiency in Analyzing Non-UMI Liquid Biopsy Datasets Using the Sentieon ctDNA Pipeline

Ultra-deep sequencing with unique molecular identifier(UMI) for detection of ctDNA by fragment profiling using machine learning.

Evaluating Bioinformatics Processing of Somatic Variant Detection in cfDNA Using Targeted Sequencing with UMIs

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection

High efficiency error suppression for accurate detection of low-frequency variants

Reproducible and High Sample Throughput Isomir Next-Generation Sequencing for Cancer Diagnosis.

Towards an Accurate and Robust Analysis Pipeline for Somatic Mutation Calling

A unique molecular identifier-based and clonal hematopoiesis-aware approach for accurate mutation calling in cell-free DNA assays.

Abstract 7425: UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules

Abstract 151: an Improved Computational Pipeline for Tumor Somatic Alterations Detection

Comprehensive detection of ctDNA variants at 0.1% allelic frequency using a broad targeted NGS panel for liquid biopsy research.

Computational performance and accuracy of Sentieon DNASeq variant calling workflow

CancerScreen: A Novel Ultrasensitive Liquid Biopsy for Early-Stage Cancer Detection by Ctdna Duplex Sequencing and Tissue of Origin Identification with Supervised Machine Learning.

Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy

Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology

A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types

A Three-Caller Pipeline for Variant Analysis of Cancer Whole-Exome Sequencing Data

Digital microfluidics-based digital counting of single-cell copy number variation (dd-scCNV Seq)

Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking

An Innovative Data Analysis Strategy For Accurate NGS Detection of Tumor mtDNA Mutations