Abstract:Background: The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the number of parameters in the genetic association model compared to the number of phenotypes has led to further deterioration in statistical power and an increase in co-linearity and false positive rates. At best, HD panels do not significantly improve GS accuracy and, at worst, reduce accuracy. This is true for both regression and variance component approaches. To remedy this situation, some form of single nucleotide polymorphisms (SNP) filtering or external information is needed. Current methods for prioritizing SNP markers (i.e. BayesB, BayesCπ) are sensitive to the increased co-linearity in HD panels which could limit their performance. Results: In this study, the usefulness of FST, a measure of allele frequency variation among populations, as an external source of information in GS was evaluated. A simulation was carried out for a trait with heritability of 0.4. Data was divided into three subpopulations based on phenotype distribution (bottom 5%, middle 90%, top 5%). Marker data were simulated to mimic a 770 K and 1.5 million SNP marker panel. A ten-chromosome genome with 200 K and 400 K SNPs was simulated. Several scenarios with varying distributions for the quantitative trait loci (QTL) effects were simulated. Using all 200 K markers and no filtering, the accuracy of genomic prediction was 0.77. When marker effects were simulated from a gamma distribution, SNPs pre-selected based on the 99.5, 99.0 and 97.5% quantile of the FST score distribution resulted in an accuracy of 0.725, 0.797, and 0.853, respectively. Similar results were observed under other simulation scenarios. Clearly, the accuracy obtained using all SNPs can be easily achieved using only 0.5 to 1% of all markers. Conclusions: These results indicate that SNP filtering using already available external information could increase the accuracy of GS. This is especially important as next-generation sequencing technology becomes more affordable and accessible to human, animal and plant applications.

An Integer Programming Approach for the Selection of Tag SNPs Using Multi-allelic LD.

Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping.

Tag SNP selection based on multivariate linear regression

Haplotype Block Partitioning and Tag SNP Selection Using Genotype Data and Their Applications to Association Studies

HapBlock: Haplotype Block Partitioning and Tag SNP Selection Software Using a Set of Dynamic Programming Algorithms.

Informative Snp Selection Methods Based on Snp Prediction

Accurate Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms Using Sibship Data

MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression.

The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium.

Regression-based Approach for Testing the Association Between Multi-Region Haplotype Configuration and Complex Trait

Dynamic Programming Algorithms for Haplotype Block Partitioning and Tag SNP Selection Using Haplotype Data or Genotype Data

Large-scale Genotyping of Complex DNA

Integer programming framework for pangenome-based genome inference

[Analysis and Application of SNP and Haplotype in the Human Genome].

Linear Algebraic Tag SNP Selection and Haplotype Reconstruction

IDSSR: an Efficient Pipeline for Identifying Polymorphic Microsatellites from a Single Genome Sequence

High density marker panels, SNPs prioritizing and accuracy of genomic selection

Detect and Adjust for Population Stratification in Population-Based Association Study Using Genomic Control Markers: an Application of Affymetrix Genechip® Human Mapping 10K Array

An automatic high-throughput single nucleotide polymorphism genotyping approach based on universal tagged arrays and magnetic nanoparticles

Tag-extension-based method for sensitive and specific genotyping of single nucleotide polymorphism on microarray.

Approximation Algorithms for the Selection of Robust Tag SNPs