Better confidence intervals in simulation-based inference of historical demography from population genetic data

Francois Rousset,Raphael Leblois,Arnaud Estoup,Jean-Michel Marin
DOI: https://doi.org/10.1101/2024.09.30.615940
2024-11-27
Abstract:We describe and evaluate a method of statistical inference of model parameters, which revisits the idea of inferring a likelihood surface using simulation when the likelihood function cannot be evaluated. The method aims in particular to provide confidence intervals with controlled coverage, and its performance is assessed accordingly. It is based on a combining the random forest machine learning method, and multivariate Gaussian mixture (MGM) models, in an effective inference workflow, here used to fit models with up to 15 variable parameters. Masked autoregressive flows, a deep learning technique, is also tested as an alternative to MGM models. The method is compared to that of approximate Bayesian computation (ABC) with random forests, with which it shares some technical features, on scenarios of inference of historical demography from population genetic data. These comparisons highlight the importance of an iterative workflow for exploring the parameter space efficiently. For equivalent simulation effort of the data-generating process, the new summary-likelihood method provides better control of coverage than ABC with random forests, and than generally reported for ABC methods.
Bioinformatics
What problem does this paper attempt to address?