Revealing the range of maximum likelihood estimates in the admixture model.

Carola Sophia Heinzel,Franz Baumdicker,Peter Pfafffelhuber
DOI: https://doi.org/10.1101/2024.10.18.619150
2024-10-20
Abstract:Many ancestry inference tools, including STRUCTURE and ADMIXTURE, rely on the admixture model to infer both, allele frequencies p and individual admixture proportions q for a collection of individuals relative to a set of hypothetical ancestral populations. We show that under realistic conditions the likelihood in the admixture model is typically flat in some direction around a maximum likelihood estimate (MLE) (q, p). In particular, the maximum likelihood estimator is non-unique and there is a complete spectrum of possible estimates. Common inference tools typically identify only a few points within this spectrum. We provide an algorithm which computes the set of equally likely (q, p), when starting from (q, p). It is analytic for K=2 ancestral populations and numeric for K>2. We apply our algorithm to data from the 1000 genomes project, and show that inter-European estimators of q can come with a large set of equally likely possibilities. In general, markers with large allele frequency differences between populations in combination with individuals with concentrated admixture proportions lead to small areas with a flat likelihood. Our findings imply that care must be taken when interpreting results from STRUCTURE and ADMIXTURE if populations are not separated well enough.
Genetics
What problem does this paper attempt to address?