Abstract:Background: As high throughput sequencing (HTS) technique advances, it has been widely used in cancer research. More specifically, HTS enables genome-wide detection of somatic alterations which implicate the dynamic of cancer development as well as therapeutic strategy. However, a number of complications, such as sample tissue lesion, imperfect experimental procedure and nucleotide polymorphism between individual, often induce non-somatic noise and jeopardize the detection of somatic alterations. We present a validated computational pipeline that allows accurate somatic alteration identification. Method: We created a number of cell line mixture samples and sequenced at averagely 1500x depth. Each mixture sample was composed of cell lines with pre-specified proportion in order to represent distinct tumor purity and clonality. Prior to mixture creation, variant candidates in each cell line sample were labeled as one of three categories: germline variant of high confidence, artifact or blacklist (excluded from future assessment). The categorization was based on HaplotypeCaller results followed by a series of quality control. The allele frequency distribution of germline variants from each cell line was validated in mixture samples. We used MuTect as prototype pipeline and implemented further optimization on a number of parameter settings in a supervised fashion. Lastly, the optimized pipeline was evaluated using an independent dataset and its performance was compared with a number of alternative variant callers. Result: Our benchmark showed that default parameter setting of ‘alt_allele_in_normal' and ‘nearby_gap_events' in MuTect were over-strict. 73% of variants rejected by these parameters were actually real somatic variants. We believed that, instead of using constant as parameter settings, the optimum parameters should be trained on data with similar sequencing depth. We therefore re-configured the parameters using logistic regression. Furthermore, minimum supporting depth and minimum allele frequency were determined by AUROC. With optimized pipeline, we were able to achieve 96% sensitivity and 2/M false positive rate (FPR) compared with 90% sensitivity and 8/M FPR using default parameters. We further tested the optimized pipeline in an independent dataset and compared the performance with several well-known variant callers including MuSE, VarScan2, Strelka, LoFreq, VarDict and MuTect2. The performance of each caller was ranked as following: our optimized pipeline > MuTect > MuTect2/Strelka/LoFreq > Muse > VarScan2/VarDict. Citation Format: Cheng Yan, Yingjiang Ye, Weihua Guo, Junhui Yang, Yufei Yang. An improved computational pipeline for tumor somatic alterations detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 151.

High efficiency error suppression for accurate detection of low-frequency variants

Selective multiplexed enrichment for the detection and quantitation of low-fraction DNA variants via low-depth sequencing

A superior strategy for single-cell mutational screening via multiplex-targeted QPCR using the BioMark HD microfluidic platform.

Optimizing Accuracy and Efficiency in Analyzing Non-UMI Liquid Biopsy Datasets Using the Sentieon ctDNA Pipeline

Detecting and Quantitating Low Fraction DNA Variants with Low-Depth Sequencing

Detection of low-frequency mutations in clinical samples by increasing mutation abundance via the excision of wild-type sequences

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection

Ultra-deep sequencing with unique molecular identifier(UMI) for detection of ctDNA by fragment profiling using machine learning.

Ultra-sensitive molecular residual disease detection through whole genome sequencing with single-read error correction

OPUSeq Simplifies Detection of Low-Frequency DNA Variants and Uncovers Fragmentase-Associated Artifacts

Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline

Somatic Point Mutation Calling in Low Cellularity Tumors

Abstract 151: an Improved Computational Pipeline for Tumor Somatic Alterations Detection

Detection of ultra-rare mutations by next-generation sequencing

Abstract 320: Development and automation of a streamlined targeted enrichment method for cancer mutation detection

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Identification of DNA variants at ultra-low variant allele frequencies via Taq polymerase cleavage of wild-specific blockers

Calibration-free NGS quantitation of mutations below 0.01% VAF

Ultrasensitive and high-efficiency screen of de novo low-frequency mutations by o2n-seq

Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads

Evaluating Bioinformatics Processing of Somatic Variant Detection in cfDNA Using Targeted Sequencing with UMIs