DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing

Davoud Torkamaneh,Jérôme Laroche,Brian Boyle,François Belzile
DOI: https://doi.org/10.1093/bioinformatics/btz473
IF: 5.8
2020-01-01
Bioinformatics
Abstract:Motivation: Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results: Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation: https://bitbucket.org/jerlar73/DepthFinder. Supplementary information: Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?