Dollo parsimony overestimates ancestral gene content reconstructions

Alex Gàlvez-Morante,Laurent Guéguen,Paschalis Natsidis,Maximilian J Telford,Daniel J Richter
DOI: https://doi.org/10.1093/gbe/evae062
2024-03-22
Genome Biology and Evolution
Abstract:Abstract Ancestral reconstruction is a widely-used technique that has been applied to understand the evolutionary history of gain and loss of gene families. Ancestral gene content can be reconstructed via different phylogenetic methods, but many current and previous studies employ Dollo parsimony. We hypothesize that Dollo parsimony is not appropriate for ancestral gene content reconstruction inferences based on sequence homology, as Dollo parsimony is derived from the assumption that a complex character cannot be regained. This premise does not accurately model molecular sequence evolution, in which false orthology can result from sequence convergence or lateral gene transfer. The aim of this study is to test Dollo parsimony's suitability for ancestral gene content reconstruction and to compare its inferences with a maximum likelihood-based approach which allows a gene family to be gained more than once within a tree. We first compared the performance of the two approaches on a series of artificial datasets each of 5,000 genes that were simulated according to a spectrum of evolutionary rates without gene gain or loss, so that inferred deviations from the true gene count would arise only from errors in orthology inference and ancestral reconstruction. Next, we reconstructed protein domain evolution on a phylogeny representing known eukaryotic diversity. We observed that Dollo parsimony produced numerous ancestral gene content overestimations, especially at nodes closer to the root of the tree. These observations led us to the conclusion that, confirming our hypothesis, Dollo parsimony is not an appropriate method for ancestral reconstruction studies based on sequence homology.
genetics & heredity,evolutionary biology
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to evaluate whether Dollo parsimony is appropriate when reconstructing ancestral gene content and whether it will lead to over - estimation. Specifically: 1. **Hypothesis**: The authors hypothesize that Dollo parsimony is not suitable for inferring ancestral gene content reconstruction based on sequence homology. Dollo parsimony stems from the assumption that complex traits, once lost, cannot be regained, which is not applicable to molecular sequence evolution because there may be false orthogonality caused by convergent evolution or horizontal gene transfer in molecular sequence evolution. 2. **Methods**: - **Simulated data sets**: The authors first used a series of artificial data sets (each containing 5,000 genes), which were simulated according to different evolutionary rates, but gene gain or loss was not allowed. By this method, any deviation from the true number of genes in the inference can only be attributed to errors in homology inference and ancestral reconstruction. - **Actual data sets**: Next, the authors reconstructed the protein domain evolution on the phylogenetic tree representing the known eukaryotic diversity and compared the results of Dollo parsimony and the maximum - likelihood method. 3. **Results**: - **Simulated data**: Dollo parsimony often exceeds the true 5,000 - gene threshold when reconstructing the gene content of ancestral nodes, especially at nodes close to the root of the tree, and this over - estimation is more obvious. - **Actual data**: When reconstructing the Pfam domain content of early eukaryotes, Dollo parsimony produced estimates significantly higher than those of the maximum - likelihood method. In particular, the last eukaryotic common ancestor (LECA) reconstructed by Dollo parsimony has more Pfam domains than any existing species, and it tends to infer more domain losses rather than gains. 4. **Conclusion**: The research results show that Dollo parsimony does have the problem of over - estimation in the reconstruction of ancestral gene content based on sequence homology, especially in deeper evolutionary branches. Therefore, the authors suggest that Dollo parsimony should be used with caution in future ancestral reconstruction studies and consider combining it with other methods (such as the maximum - likelihood method) to reduce bias. ### Formula summary - **Dollo parsimony assumption**: A trait can be gained only once, but can be lost multiple times during evolution. \[ \text{Dollo Parsimony Assumption: } \sum_{i = 1}^{n}\text{Gain}(i)=1, \quad \sum_{i = 1}^{n}\text{Loss}(i)\geq0 \] - **Maximum - likelihood method model**: Allows gene families to be gained multiple times within the same tree. \[ \text{Maximum Likelihood Model: } \sum_{i = 1}^{n}\text{Gain}(i)\geq1, \quad \sum_{i = 1}^{n}\text{Loss}(i)\geq0 \] Through these analyses, the authors verified their hypothesis and provided an important reference for future research.