DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies

Jimin Park,Daniel E. Cook,Pi-Chuan Chang,Alexey Kolesnikov,Lucas Brambrink,Juan Carlos Mier,Joshua Gardner,Brandy McNulty,Samuel Sacco,Ayse Keskus,Asher Bryant,Tanveer Ahmad,Jyoti Shetty,Yongmei Zhao,Bao Tran,Giuseppe Narzisi,Adrienne Helland,Byunggil Yoo,Irina Pushel,Lisa A. Lansdon,Chengpeng Bi,Adam Walter,Margaret Gibson,Tomi Pastinen,Midhat S. Farooqi,Nicolas Robine,Karen H. Miga,Andrew Carroll,Mikhail Kolmogorov,Benedict Paten,Kishwar Shafin
DOI: https://doi.org/10.1101/2024.08.16.608331
2024-08-19
Abstract:Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples. To help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available a dataset of five matched tumor-normal cell line pairs sequenced with Illumina, PacBio HiFi, and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples and technologies (short-read and long-read), DeepSomatic consistently outperforms existing callers, particularly for indels.
Bioinformatics
What problem does this paper attempt to address?