Pairwise linkage disequilibrium estimation for polyploids

David Gerard
DOI: https://doi.org/10.1111/1755-0998.13349
IF: 7.7
2021-03-01
Molecular Ecology Resources
Abstract:<p>Many tasks in statistical genetics involve pairwise estimation of linkage disequilibrium (LD). The study of LD in diploids is mature. However, in polyploids, the field lacks a comprehend sive characterization of LD. Polyploids also exhibit greater levels of genotype uncertainty than diploids, and yet no methods currently exist to estimate LD in polyploids in the presence of such genotype uncertainty. Furthermore, most LD estimation methods do not quantify the level of uncertainty in their LD estimates. Our paper contains three major contributions. (i) We characterize haplotypic and composite measures of LD in polyploids. These composite measures of LD turn out to be functions of common statistical measures of association. (ii) We derive procedures to estimate haplotypic and composite LD in polyploids in the presence of genotype uncertainty. We do this by estimating LD directly from genotype likelihoods, which may be obtained from many genotyping platforms. (iii) We derive standard errors of all LD estima tors that we discuss. We validate our methods on both real and simulated data. Our methods are implemented in the R package ldsep, available on the Comprehensive R Archive Network <a class="linkBehavior" href="https://cran.r-project.org/package=ldsep">https://cran.r‐project.org/package=ldsep</a>.</p>
biochemistry & molecular biology,ecology,evolutionary biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the estimation of Linkage Disequilibrium (LD) in polyploid organisms. Specifically, the paper focuses on the following aspects: 1. **Characteristics of LD in polyploid organisms**: Compared with diploid organisms, the methods for characterizing and estimating LD in polyploid organisms are relatively limited in genetic research. The paper aims to provide a comprehensive characterization of LD for polyploid organisms. 2. **LD estimation in the presence of genotypic uncertainty**: In polyploid organisms, the determination of genotypes is often accompanied by high uncertainty. Existing LD estimation methods usually assume that genotypes are known and error - free, which is not applicable in polyploid organisms. Therefore, the paper proposes a method for estimating LD in the presence of genotypic uncertainty. 3. **Definition and estimation of composite LD measures**: To overcome complex situations such as random mating assumptions and partial preferential pairing in polyploid organisms, the paper defines new composite LD measures and provides methods for estimating these measures in the presence of genotypic uncertainty. ### Main contributions of the paper 1. **Defined haplotypes and composite LD measures in polyploid organisms**: These composite LD measures can be represented using common statistical association measures. 2. **Proposed methods for estimating haplotypes and composite LD in the presence of genotypic uncertainty**: By directly estimating LD from genotype likelihoods, these likelihoods can be obtained from multiple genotyping platforms. 3. **Derived the standard errors of all LD estimators**: The derivation of these standard errors helps to assess the reliability of LD estimation. ### Method overview - **Haplotype LD estimation**: In autopolyploid organisms, assuming Hardy - Weinberg equilibrium (HWE), a method for estimating haplotype LD in the presence of genotypic uncertainty was derived. - **Composite LD measures**: New composite LD measures were defined, which are still valid when the HWE assumption is violated and can be estimated with only genotype information. - **Maximum likelihood estimation**: The LD parameters were estimated using the maximum likelihood method, and the corresponding standard errors were derived. - **Proportional bivariate normal distribution**: A new proportional bivariate normal distribution was introduced to model the genotype distribution at two loci, which improves the flexibility and accuracy of the estimation. ### Result verification The paper verified the effectiveness of the proposed methods through simulation experiments and actual data. In the simulation experiments, genotypes of autopolyploid organisms conforming to HWE were generated and tested with different read depths, ploidy levels, major allele frequencies, and Pearson correlation coefficients. The results show that the proposed method can effectively estimate LD under different conditions. In conclusion, this paper provides a comprehensive and practical set of methods for LD estimation in polyploid organisms, which solves the shortcomings of existing methods in dealing with genotypic uncertainty.