A Cost-effective, High-throughput, Highly Accurate Genotyping Method for Outbred Populations

Denghui Chen,Apurva S Chitre,Khai-Minh H Nguyen,Katarina A Cohen,Beverly F Peng,Kendra S Ziegler,Faith Okamoto,Bonnie Lin,Benjamin B Johnson,Thiago M Sanches,Riyan Cheng,Oksana Polesskaya,Abraham A Palmer
DOI: https://doi.org/10.1101/2024.07.17.603984
2024-07-18
Abstract:Affordable sequencing and genotyping methods are essential for large scale genome-wide association studies. While genotyping microarrays and reference panels for imputation are available for human subjects, non-human model systems often lack such options. Our lab previously demonstrated an efficient and cost-effective method to genotype heterogeneous stock rats using double-digest genotyping-by-sequencing. However, low-coverage whole-genome sequencing offers an alternative method that has several advantages. Here, we describe a cost-effective, high-throughput, high-accuracy genotyping method for N/NIH heterogeneous stock rats that can use a combination of sequencing data previously generated by double-digest genotyping-by-sequencing and more recently generated by low-coverage whole-genome-sequencing data. Using double-digest genotyping-by-sequencing data from 5,745 heterogeneous stock rats (mean 0.21x coverage) and low-coverage whole-genome-sequencing data from 8,760 heterogeneous stock rats (mean 0.27x coverage), we can impute 7.32 million bi-allelic single-nucleotide polymorphisms with a concordance rate >99.76% compared to high-coverage (mean 33.26x coverage) whole-genome sequencing data for a subset of the same individuals. Our results demonstrate the feasibility of using sequencing data from double-digest genotyping-by-sequencing or low-coverage whole-genome-sequencing for accurate genotyping, and demonstrate techniques that may also be useful for other genetic studies in non-human subjects.
Genetics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the high - cost and low - throughput genotyping problems faced when conducting large - scale genome - wide association studies (GWAS) in non - human model organisms. Specifically, for Heterogeneous Stock Rats (HS Rats), the application of traditional genotyping microarrays and reference panels in these model organisms is limited because these tools are usually designed for humans and perform poorly in other species. Therefore, developing an economical and efficient genotyping method is crucial for promoting genetic research in non - human model organisms. To meet this challenge, the authors propose a low - cost, high - throughput, and high - precision genotyping method that combines Double - Digest Genotyping - by - Sequencing (ddGBS) and Low - Coverage Whole - Genome Sequencing (lcWGS) data. Through this method, they were able to perform genotyping on N/NIH heterogeneous stock rats and use previously generated ddGBS data and recently generated lcWGS data to infer more than 7.3 million biallelic single - nucleotide polymorphisms (SNPs), with a concordance rate of more than 99.76% compared to high - coverage (average 33.26 - fold coverage) whole - genome sequencing data. In summary, the main goal of this study is to provide an efficient and low - cost genotyping solution suitable for heterogeneous stock rats to support larger - scale genetic research.