Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline

Jinnan Hu,Cai Jiang,Yu S. Huang,Haodong Chen,Hanying Feng,Donald Freed,Yan Qu,Rui Fan,Zhencheng Su,Weizhi Chen
DOI: https://doi.org/10.1101/2022.06.03.494742
2024-01-13
Abstract:Liquid biopsy enables identification of low allele frequency (AF) tumor variants and novel clinical applications such as minimum residual disease (MRD) monitoring. However, challenges remain, primarily due to limited sample volume and low read count of low-AF variants. Because of the low AFs, some clinically significant variants are difficult to distinguish from errors introduced by PCR amplification and sequencing. Unique Molecular Identifiers (UMIs) have been developed to further reduce base error rates and improve the variant calling accuracy, which enables better discrimination between background errors and real somatic variants. While multiple UMI-aware ctDNA analysis pipelines have been published and adopted, their accuracy and runtime efficiency could be improved. In this study, we present the Sentieon ctDNA pipeline, a fast and accurate solution for small somatic variant calling from ctDNA sequencing data. The pipeline consists of four core modules: alignment, consensus generation, variant calling, and variant filtering. We benchmarked the ctDNA pipeline using both simulated and real datasets, and found that the Sentieon ctDNA pipeline is more accurate than alternatives.
Bioinformatics
What problem does this paper attempt to address?