Haplotype-Based Genotyping in Polyploids

Josh P. Clevenger,Walid Korani,Peggy Ozias-Akins,Scott Jackson
DOI: https://doi.org/10.3389/fpls.2018.00564
IF: 5.6
2018-04-26
Frontiers in Plant Science
Abstract:Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs) are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2) was developed for Arachis hypogaea (peanut), an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.
plant sciences
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **the challenge of accurately identifying single - nucleotide polymorphisms (SNPs) in polyploid crops**. Specifically, due to the complexity and repetitiveness of polyploid genomes, there is uncertainty in the alignment of short - read - length sequencing data, resulting in low accuracy of SNP identification. To solve this problem, the author proposed and verified a haplotype - based genotyping method to improve the accuracy of SNP identification in polyploid species. ### Main problems: 1. **Complexity of polyploid genomes**: Polyploid crops have multiple chromosome sets, resulting in a large number of homologous sequences within the genome, making it difficult for short - read - length sequencing data to be accurately aligned to the correct genomic location. 2. **Low confidence in SNP identification**: Homologous sequences in polyploid genomes may cause short - read - length sequencing data to be misaligned to different sub - genomes, resulting in false - positive SNP identification results. 3. **Limitations of existing methods**: Traditional SNP arrays and reduced - genome sequencing strategies (such as GBS and RADSeq) can provide a large number of markers, but there are problems of sampling bias and inability to identify rare variants. ### Solutions: The author proposed a haplotype - based genotyping method to identify SNPs in polyploid crops by collecting and comparing haplotypes in different samples. This method can more accurately distinguish sequences from different sub - genomes, thereby improving the accuracy of SNP identification. To verify the effectiveness of this method, the author developed a tool named HAPLOSWEEP and carried out large - scale verification on peanut (Arachis hypogaea, an allotetraploid crop) using a 48K SNP array (Axiom Arachis2). ### Results: - 74% of haplotype - based SNP markers were verified, significantly higher than that of traditional methods. - Through the analysis of whole - genome re - sequencing data, it is estimated that the true positive rate (TPR) of this method exceeds 89%. ### Conclusions: The haplotype - based genotyping method shows higher accuracy in polyploid crops, can effectively reduce the identification of false - positive SNPs, and is applicable to other allopolyploid species.