Abstract 151: an Improved Computational Pipeline for Tumor Somatic Alterations Detection

CH Yan,Yingjiang Ye,Weihua Guo,Junhui Yang,Yufei Yang
DOI: https://doi.org/10.1158/1538-7445.am2021-151
IF: 11.2
2021-01-01
Cancer Research
Abstract:Background: As high throughput sequencing (HTS) technique advances, it has been widely used in cancer research. More specifically, HTS enables genome-wide detection of somatic alterations which implicate the dynamic of cancer development as well as therapeutic strategy. However, a number of complications, such as sample tissue lesion, imperfect experimental procedure and nucleotide polymorphism between individual, often induce non-somatic noise and jeopardize the detection of somatic alterations. We present a validated computational pipeline that allows accurate somatic alteration identification. Method: We created a number of cell line mixture samples and sequenced at averagely 1500x depth. Each mixture sample was composed of cell lines with pre-specified proportion in order to represent distinct tumor purity and clonality. Prior to mixture creation, variant candidates in each cell line sample were labeled as one of three categories: germline variant of high confidence, artifact or blacklist (excluded from future assessment). The categorization was based on HaplotypeCaller results followed by a series of quality control. The allele frequency distribution of germline variants from each cell line was validated in mixture samples. We used MuTect as prototype pipeline and implemented further optimization on a number of parameter settings in a supervised fashion. Lastly, the optimized pipeline was evaluated using an independent dataset and its performance was compared with a number of alternative variant callers. Result: Our benchmark showed that default parameter setting of ‘alt_allele_in_normal' and ‘nearby_gap_events' in MuTect were over-strict. 73% of variants rejected by these parameters were actually real somatic variants. We believed that, instead of using constant as parameter settings, the optimum parameters should be trained on data with similar sequencing depth. We therefore re-configured the parameters using logistic regression. Furthermore, minimum supporting depth and minimum allele frequency were determined by AUROC. With optimized pipeline, we were able to achieve 96% sensitivity and 2/M false positive rate (FPR) compared with 90% sensitivity and 8/M FPR using default parameters. We further tested the optimized pipeline in an independent dataset and compared the performance with several well-known variant callers including MuSE, VarScan2, Strelka, LoFreq, VarDict and MuTect2. The performance of each caller was ranked as following: our optimized pipeline > MuTect > MuTect2/Strelka/LoFreq > Muse > VarScan2/VarDict. Citation Format: Cheng Yan, Yingjiang Ye, Weihua Guo, Junhui Yang, Yufei Yang. An improved computational pipeline for tumor somatic alterations detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 151.
What problem does this paper attempt to address?