Gene Duplication and Evolution
LQ Zhang,BS Gaut,TJ Vision
DOI: https://doi.org/10.1126/science.293.5535.1553p
IF: 56.9
2001-01-01
Science
Abstract:Lynch and Conery (1) presented one of the first serious efforts to study the evolutionary fate of gene duplication using genomic sequence data. Their analysis led to several interesting observations, particularly with respect to the rate of gene duplication in eukaryotic genomes and the subsequent halflife of duplicates. These two parameters are of particular importance in studying the evolutionary processes of gene duplication and subsequent functional divergence. The most frequent class of duplications appeared to be similar in all six species, which suggests some silencing process for old duplicates. Several additional considerations in the analysis and interpretation, however, might have led to some different conclusions. First, Lynch and Conery (1) used the number of substitutions per silent site, S, to measure the age of a duplicate-gene pair [figure 2 of (1)]. It is unclear, however, that silent divergence is a suitable proxy for a molecular clock involving different genes or gene duplicates. For example, Zeng et al. (2) reported 9to 15-fold differences in S values and a flat distribution of S for 24 single-copy genes in Drosophila. Two points are important in this context: (i) this large variation in S is expected when the divergence time is low; and (ii) the divergence time for each comparison made by Zeng et al. (2) was fixed. Thus, for different genes, S may vary by more than an order of magnitude given a fixed divergence time. This situation differs from description of divergence time using S values from homologous genes across a group of organisms, in which a dependable molecular clock may exist. The same S values may represent duplicates of very different ages, and the different S values may be from duplicates of the same or similar ages. Thus, figure 2 of (1) should be viewed with caution as a description of the age distribution of gene duplications. A related issue is the reliability of estimates of S, because many of the values presented by Lynch and Conery (1) were larger than 1. Estimates larger than 1 are associated with a large variance due to saturation of substitutions and should generally be considered unreliable (3). Second, the calculation of the half-life of gene duplicates was based on the untested, hidden assumption that the rate of gene duplication is constant over evolutionary time—an assumption implicit in both figure 3 and equation 3 of (1). Unfortunately, there are insufficient data with which to estimate the variation in the rate of gene duplication on a short time scale; nevertheless, there is some evidence that the duplication rate for some families may indeed not be stationary over a short evolutionary time. For example, in the mouse Sp100-rs family, a short lineage of Mus musculus has created at least 60 gene duplicates within 1.7 million years; other lineages such as the sibling taxa Mus caroli, a group that diverged 2.5 million years ago, contain few duplicates (4). If the duplication rate over the time during which divergence is observed is much lower than the recent rate of duplication, the half-life calculated by Lynch and Conery would represent a serious underestimate. Finally, an alternative interpretation for the short half-life of duplicate genes before silencing may deserve consideration. Assuming that small values of S may more reliably reflect a short evolutionary time, the authors chose to estimate the half-life of duplicate genes only from gene pairs with S values in the range of 0 to 0.25. They estimated a mean half-life of 4 million years, concluding that “the fate awaiting most gene duplications appears to be silencing rather than preservation,” and, hence, that “duplicate genes may only rarely evolve new functions.” Yet their analysis appears to have ignored several important features of the data [figure 2 of (1)]. (i) Notwithstanding their model of “young” duplicates, the tails of the distribution are long and flat, which suggests that the data are actually heterogeneous. (ii) The proportions of the duplications that reside in the tails are high—85% for Drosophila melanogaster, 66% for Caenorhabditis elegans, and 65% for Saccharomyces cerevisiae. (iii) The tails include old and ancient duplications. The heterogeneity of the age distribution in figure 2 of (1) suggests that the short half-life calculated from young duplicate-gene pairs cannot be extended to most pairs. After all, a large proportion of these older duplicates may be much older than 4 million years, with real ages of tens or hundreds of million years. It is likely that these genes have been functional since their origin; otherwise, the duplicate sequences would have been deleted from the genome (5). In addition, the absolute number of old or ancient gene duplicates is relatively large. For example, 40% of the approximately 13,600 coding sequences in the D. melanogaster genome appear to have arisen by gene duplication (6). Thus, some 34% of the fly genome, or 4624 genes [40% 3 85% 3 13,600, with the 85% from item (ii), above], comprise old or ancient duplicates. It is therefore misleading to assert that the vast majority of gene duplicates are quickly silenced, even if the calculation of the half-life is correct. Rather, it appears that the accumulation of “survivors” of the silencing process constitutes a large fraction of modern eukaryotic genomes. An analogy for the application of halflives is the mortality of newborns centuries ago: At that time the infant mortality rate was very high, because medical science was underdeveloped—but just because the “halflife” of newborns is short, it does not follow that half of all adults will die shortly. We suggest that figure 2 of (1) supports a conclusion opposite to the one that Lynch and Conery drew: A large proportion of duplicate genes either have evolved new functions (7) or have been maintained by subfunctionalization (8, 9) or other mechanisms.