Abstract:Background: Single-nucleotide polymorphism (SNP) effects can be backsolved from ssGBLUP genomic estimated breeding values (GEBV) and used for genome-wide association studies (ssGWAS). However, obtaining p-values for those SNP effects relies on the inversion of dense matrices, which poses computational limitations in large genotyped populations. In this study, we present a method to approximate SNP p-values for ssGWAS with many genotyped animals. This method relies on the combination of a sparse approximation of the inverse of the genomic relationship matrix ( G A P Y - 1 ) built with the algorithm for proven and young ( APY ) and an approximation of the prediction error variance of SNP effects which does not require the inversion of the left-hand side (LHS) of the mixed model equations. To test the proposed p-value computing method, we used a reduced genotyped population of 50K genotyped animals and compared the approximated SNP p-values with benchmark p-values obtained with the direct inverse of LHS built with an exact genomic relationship matrix ( G - 1 ) . Then, we applied the proposed approximation method to obtain SNP p-values for a larger genotyped population composed of 450K genotyped animals. Results: The same genomic regions on chromosomes 7 and 20 were identified across all p-value computing methods when using 50K genotyped animals. In terms of computational requirements, obtaining p-values with the proposed approximation reduced the wall-clock time by 38 times and the memory requirement by ten times compared to using the exact inversion of the LHS. When the approximation was applied to a population of 450K genotyped animals, two new significant regions on chromosomes 6 and 14 were uncovered, indicating an increase in GWAS detection power when including more genotypes in the analyses. The process of obtaining p-values with the approximation and 450K genotyped individuals took 24.5 wall-clock hours and 87.66GB of memory, which is expected to increase linearly with the addition of noncore genotyped individuals. Conclusions: With the proposed method, obtaining p-values for SNP effects in ssGWAS is computationally feasible in large genotyped populations. The computational cost of obtaining p-values in ssGWAS may no longer be a limitation in extensive populations with many genotyped animals.

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

EBT: a Statistic Test Identifying Moderate Size of Significant Features with Balanced Power and Precision for Genome-Wide Rate Comparisons

GWAS significance thresholds in large cohorts

Revisiting the genome-wide significance threshold for common variant GWAS

Estimation of Genotype Error Rate Using Samples with Pedigree Information--an Application on the GeneChip Mapping 10K Array.

Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study.

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes

Marker effect p-values for single-step GWAS with the algorithm for proven and young in large genotyped populations

Detect and Adjust for Population Stratification in Population-Based Association Study Using Genomic Control Markers: an Application of Affymetrix Genechip® Human Mapping 10K Array

A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms

Missing Call Bias in High-Throughput Genotyping

Robust Methods for Disease-Genotype Association in Genetic Association Studies: Calculate P-values Using Exact Conditional Enumeration instead of Asymptotic Approximations

A Gene Selection Method for GeneChip Array Data with Small Sample Sizes

A Shrinkage Method for Testing the Hardy–Weinberg Equilibrium in Case‐Control Studies

Accounting for multiple comparisons in a genome-wide association study (GWAS)

High density marker panels, SNPs prioritizing and accuracy of genomic selection

Estimation of a significance threshold for epigenome-wide association studies

Accurate and Fast Small P-Value Estimation for Permutation Tests in High-Throughput Genomic Data Analysis with the Cross-Entropy Method.

Statistical power and significance testing in large-scale genetic studies

Technical reproducibility of genotyping SNP arrays used in genome-wide association studies.

Powerful gene-based testing by integrating long-range chromatin interactions and knockoff genotypes