Modelling gene content across a phylogeny to determine when genes become associated

Jiahao Diao,Malgorzata M. O'Reilly,Barbara R. Holland
2023-09-06
Abstract:In this work, we develop a stochastic model of gene gain and loss with the aim of inferring when (if at all) in evolutionary history and association between two genes arises. The data we consider is a species tree along with information on the presence or absence of two genes in each of the species. The biological motivation for our model is that if two genes are involved in the same biochemical pathway, i.e. they are both required for some function, then the rate of gain or loss of one gene in the pathway should depend upon the presence or absence of the other gene in the pathway. However, if the two genes are not functionally linked, then the rate of gain or loss of one gene should be independent of the state of another gene. We simulate data under this model to determine under what conditions a shift from the independent rates class to the dependent rates class can be detected. For example, how large a tree is required and how large a shift in the rates is needed before Akaike information criterion (AIC) supports a model with two rate classes over a simpler model with just one rate class? If a model with two rate classes is preferred, can it correctly detect where on the evolutionary tree the shift occurred?
Populations and Evolution,Probability
What problem does this paper attempt to address?