Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking

Aldo Sergi,Luca Beltrame,Sergio Marchini,Marco Masseroli
DOI: https://doi.org/10.1186/s12859-024-05793-8
IF: 3.307
2024-05-10
BMC Bioinformatics
Abstract:High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?