Abstract 4926: Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data

Doron Shem-Tov,Maya Levy,Gil Hornung,Ilya Soifer,Hila Benjamin,Ariel Jaimovich,Adam Blattler,William Brandler,Robert Sugar,Isaac Kinde,Omer Barad,Doron Lipson
DOI: https://doi.org/10.1158/1538-7445.am2024-4926
IF: 11.2
2024-03-22
Cancer Research
Abstract:Abstract Somatic variant calling involves the identification of genomic alterations that occur in somatic cells, requiring deep coverage to enable high sensitivity for low-frequency variants. Characterizing somatic variants across the entire genome therefore benefits from novel cost-efficient sequencing platforms, such as UG100. Here, we present optimization of variant calling tools for short and structural variants on WGS and WES data from UG100. For calling short variants, we optimized DeepVariant (DV) for somatic calling using data from matched tumor-normal sample pairs, improving both variant calling accuracy and pipeline running time (up to 10-fold). We defined the task of somatic variant calling as deciding if the pileup image containing reads from the tumor and normal samples represents a true somatic variant (vs a germline variant or artifact). The challenge of robust variant calling using deep learning models is exacerbated in somatic calling, where sequencing depth and coverage variability are typically high. Our optimized DV overcomes these challenges by several data sampling strategies. First, allele-frequency preserving down-sampling reduces randomness of read sub-sampling in high coverage regions. Second, alternative allele prioritization samples alt-allele supporting reads first allowing to call variants at very high coverage loci without sacrificing sensitivity and computational efficiency. Finally, a Panel-of-Normals based on targeted WES data provides an additional improvement of precision for this assay type. We used these strategies to train two models, one for tumor characterization using WGS (T/N coverage: 40x-150x/40x-100x), and one for deep WES (T/N coverage: >500x/>120x). We called variants on simulated tumors using the WGS model. For VAF>10% the model showed SNV recall >98% and indel recall >95% with false-positive rate of 0.2/Mb. For VAF range of 5-10%, indel recall was 67% and SNV recall was 86%. To demonstrate the utility of our somatic variant calling, we applied the models to call somatic variants from well characterized cancer cell lines: COLO829, HCC1395 and HCC1143. Results showed F1>90% for variants with VAF>10%. The WES model was used to reliably call variants at VAF>5% on simulated tumors with average SNV recall of 99% with precision >99% and indel recall >86% with precision >94%. To analyze structural and copy-number variations, we optimized the assembly engine of GRIDSS to enable fast calling of structural variations and demonstrate that Control-FREEC can be used to call copy number variants. SV calling on COLO829/COLO829BL achieved sensitivity >95%. In conclusion, our research highlights the utility of UG100 within the field of oncology, demonstrating its capacity for comprehensive and precise somatic variant detection, both on WGS and WES data. Citation Format: Doron Shem-Tov, Maya Levy, Gil Hornung, Ilya Soifer, Hila Benjamin, Ariel Jaimovich, Adam Blattler, William Brandler, Robert Sugar, Isaac Kinde, Omer Barad, Doron Lipson. Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4926.
oncology
What problem does this paper attempt to address?