Systematic evaluation of de novo mutation calling tools using whole genome sequencing data

Anushi Shah,Steven Monger,Michael Troup,Eddie KK Ip,Eleni Giannoulatou
DOI: https://doi.org/10.1101/2024.08.28.610208
2024-08-30
Abstract:De novo mutations (DNMs) are genetic alterations that occur for the first time in an offspring. DNMs have been found to be a significant cause of severe developmental disorders. With the widespread use of next-generation sequencing (NGS) technologies, accurate detection of DNMs is crucial. Several bioinformatics tools have been developed to call DNMs from NGS data, but no study to date has systematically compared these tools. We used both real whole genome sequencing (WGS) data from a trio from the 1000 Genomes Project (1000G) and simulated trio data to evaluate four DNM calling tools: DeNovoGear, TrioDeNovo, PhaseByTransmission, and VarScan 2. For DNMs called in the real dataset, we observed 8.7% concordance between all tools, while up to 36.2% of DNMs were unique to each caller. For simulated trio WGS datasets spiked with 100 DNMs, the concordance rate was also low at 4.1%. DeNovoGear achieved the highest F1-score on both the real and simulated datasets. Our study provides valuable recommendations for the selection and application of DNM callers on WGS trio data.
Bioinformatics
What problem does this paper attempt to address?