Site-specific structure and stability constrained substitution models improve phylogenetic inference

Ivan Lorca-Alonso,Miguel Arenas,Ugo Bastolla
DOI: https://doi.org/10.1101/2023.01.22.525075
2024-06-28
Abstract:In previous studies, we presented site-specific substitution models of protein evolution based on selection on the folding stability of the native state (Stab-CPE), which predict more realistically the evolutionary variability across protein sites. However, those Stab-CPE present qualitative differences from observed data, probably because they ignore changes in the native structure, despite empirical studies suggesting that conservation of the native structure is a stronger selective force than selection on folding stability. Here we present novel structurally constrained substitution models (Str-CPE) based on Julian Echave's model of the structural change due to a mutation as the linear response of the protein to a perturbation and on the explicit model of the perturbation generated by a specific amino-acid mutation. Compared to our previous Stab-CPE models, the novel Str-CPE models are more stringent (they predict lower sequence entropy and substitution rate), provide higher likelihood to multiple sequence alignments (MSA) that include one or more known structures, and better predict the observed conservation across sites. The models that combine Str-CPE and Stab-CPE models are even more stringent and fit the empirical MSAs better. We refer collectively to our models as structure and stability constrained substitution models (SSCPE). Importantly in comparison to the traditional empirical substitution models, the SSCPE models infer phylogenetic trees of distantly related proteins more similar to reference trees based on structural information. We implemented the SSCPE models in the program SSCPE.pl, freely available at https://github.com/ugobas/SSCPE, which infers phylogenetic trees under the SSCPE models with the program RAxML-NG from a concatenated alignment and a list of protein structures that overlap with it.
Evolutionary Biology
What problem does this paper attempt to address?