Using mathematical constraints to explain narrow ranges for allele-sharing dissimilarities

Xiran Liu,Zarif Ahsan,Noah A Rosenberg
DOI: https://doi.org/10.1101/2024.11.19.624404
2024-11-21
Abstract:Allele-sharing dissimilarity (ASD) statistics are measures of genetic differentiation for pairs of individuals or populations. Given the allele-frequency distributions of two populations—possibly the same population—the expected value of an ASD statistic is computed by evaluating the expectation of the pairwise dissimilarity between two individuals drawn at random, each from its associated allele-frequency distribution. For each of two ASD statistics, which we term D1 and D2, we investigate the extent to which the expected ASD is constrained by allele frequencies in the two populations; in other words, how is the magnitude of the measure bounded as a function of the frequency of the most frequent allelic type? We first consider dissimilarity of a population with itself, obtaining bounds on expected ASD in terms of the frequency of the most frequent allelic type in the population. We then examine pairs of populations that might or might not possess the same most frequent allelic type. Across the unit interval for the frequency of the most frequent allelic type, the expected allele-sharing dissimilarity has a range that is more restricted than the [0,1] interval. The mathematical constraints on expected ASD assist in explaining a pattern observed empirically in human populations, namely that when averaging across loci, allele-sharing dissimilarities between pairs of individuals often tend to vary only within a relatively narrow range.
Biology
What problem does this paper attempt to address?