Abstract:Abstract Background Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets. Results In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 “gold standard” WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline’s performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. Conclusions The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.

Benchmarking small-variant genotyping in polyploids

Genotyping Polyploids from Messy Sequencing Data

Single Nucleotide Polymorphism Identification in Polyploids: A Review, Example, and Recommendations

Variant calling in polyploids for population and quantitative genetics

Sequence coverage required for accurate genotyping by sequencing in polyploid species

Benchmarking of Low Coverage Sequencing Workflows for Precision Genotyping in Eggplant

Impact of genotype‐calling methodologies on genome‐wide association and genomic prediction in polyploids

A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

Haplotype-Based Genotyping in Polyploids

Tools for Genetic Studies in Experimental Populations of Polyploids

A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline

Genotyping single nucleotide polymorphisms and inferring ploidy by amplicon sequencing for polyploid, ploidy‐variable organisms

Large-scale Genotyping of Complex DNA

Benchmarking variant callers in next-generation and third-generation sequencing analysis

Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome

The Platinum Pedigree: A long-read benchmark for genetic variants

Benchmarking Imputed Low Coverage Genomes in a Human Population Genetics Context

A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.

Benchmarking for genotyping and imputation using degraded DNA for forensic applications across diverse populations

Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery