A Maximum Likelihood Method for Detecting Functional Divergence at Individual Codon Sites, with Application to Gene Family Evolution

Joseph P. Bielawski,Ziheng Yang,JosephP. Bielawski
DOI: https://doi.org/10.1007/s00239-004-2597-8
2004-07-01
Journal of Molecular Evolution
Abstract:The tailoring of existing genetic systems to new uses is called genetic co-option. Mechanisms of genetic co-option have been difficult to study because of difficulties in identifying functionally important changes. One way to study genetic co-option in protein-coding genes is to identify those amino acid sites that have experienced changes in selective pressure following a genetic co-option event. In this paper we present a maximum likelihood method useful for measuring divergent selective pressures and identifying the amino acid sites affected by divergent selection. The method is based on a codon model of evolution and uses the nonsynonymous-to-synonymous rate ratio (ω) as a measure of selection on the protein, with ω = 1, <1, and >1 indicating neutral evolution, purifying selection, and positive selection, respectively. The model allows variation in ω among sites, with a fraction of sites evolving under divergent selective pressures. Divergent selection is indicated by different ω’s between clades, such as between paralogous clades of a gene family. We applied the codon model to duplication followed by functional divergence of (i) the ε and γ globin genes and (ii) the eosinophil cationic protein (ECP) and eosinophil-derived neurotoxin (EDN) genes. In both cases likelihood ratio tests suggested the presence of sites evolving under divergent selective pressures. Results of the ε and γ globin analysis suggested that divergent selective pressures might be a consequence of a weakened relationship between fetal hemoglobin and 2,3-diphosphoglycerate. We suggest that empirical Bayesian identification of sites evolving under divergent selective pressures, combined with structural and functional information, can provide a valuable framework for identifying and studying mechanisms of genetic co-option. Limitations of the new method are discussed.
genetics & heredity,biochemistry & molecular biology,evolutionary biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to detect the functional divergence of proteins after gene duplication events and identify the specific amino acid sites affected by this functional divergence. Specifically, the authors developed a maximum - likelihood - based method. By analyzing the ratio of the non - synonymous substitution rate to the synonymous substitution rate ($\omega=\frac{dN}{dS}$), it measures the changes in different selection pressures and determines those amino acid sites that have experienced changes in selection pressure. This method is particularly suitable for studying protein functional divergence due to gene duplication in gene families. ### Background of the Paper Gene duplication is one of the important mechanisms of genetic co - option. It allows genes to develop different protein functions or new expression patterns after duplication. This post - duplication functional divergence plays an important role in the adaptive differentiation process of multicellular organisms. However, it is usually very difficult to identify functionally important amino acid changes in these events. ### Research Methods The authors proposed a maximum - likelihood method based on the codon model. This method allows the $\omega$ value to vary between different sites and allows the $\omega$ value to be different between two branches (such as homologous branches in a gene family) at certain sites. This helps to detect the divergence of selection pressure that occurs after gene duplication. ### Application Cases The authors applied this method to two specific gene families: 1. **$\varepsilon$ and $\gamma$ - globin genes**: These two genes were duplicated from the embryonic $\varepsilon$ - type globin gene and are respectively expressed in the early developmental stage. 2. **Eosinophil cationic protein (ECP) and eosinophil - derived neurotoxin (EDN) genes**: These two genes were generated through a gene duplication event approximately 31 million years ago. ### Results - **$\varepsilon$ and $\gamma$ - globin genes**: Through the maximum - likelihood ratio test (LRT), it was found that there are significant differences in selection pressure at certain sites between these two genes. In particular, the $\varepsilon$ gene has experienced stronger purifying selection at some sites, while the $\gamma$ gene has experienced weaker purifying selection at other sites. - **ECP and EDN genes**: Also through the LRT test, it was found that there are also significant differences in selection pressure at certain sites between these two genes, which supports their functional divergence. ### Conclusions This method provides a valuable framework for identifying and studying the mechanisms of gene co - option. By combining structural and functional information, a deeper understanding of the evolution process of protein functions after gene duplication can be achieved.