Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

Daniël P. Melters,Keith R. Bradnam,Hugh A. Young,Natalie Telis,Michael R. May,J. Graham Ruby,Robert Sebra,Paul Peluso,John Eid,David Rank,José Fernando Garcia,Joseph L. DeRisi,Timothy Smith,Christian Tobias,Jeffrey Ross-Ibarra,Ian F. Korf,Simon W.-L. Chan
DOI: https://doi.org/10.1186/gb-2013-14-1-r10
2012-09-22
Abstract:Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. The assumption that the most abundant tandem repeat is the centromere DNA was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and in length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond ~50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution, including the appearance of higher order repeat structures in which several polymorphic monomers make up a larger repeating unit. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animals and plants. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.
Genomics
What problem does this paper attempt to address?
The problems that this paper attempts to solve are related to the universality and evolutionary patterns of centromeric tandem repeat sequences in animal and plant species. Specifically, the researchers want to answer the following questions: 1. **How prevalent are high - copy - number tandem repeat sequences in the centromeres of different animal and plant species?** The researchers explored the distribution of these repeat sequences in different species by analyzing data from 282 species. 2. **Do centromeric tandem repeat sequences have common characteristics?** The researchers analyzed the length, GC content, and genomic abundance of centromeric tandem repeat sequences in different species to determine whether these sequences have some conserved features. 3. **How fast does the centromeric DNA sequence evolve?** By comparing centromeric tandem repeat sequences between different species, the researchers evaluated the evolutionary rate of these sequences and explored their conservation at different evolutionary distances. 4. **Which species lack high - copy - number centromeric tandem repeat sequences?** The researchers identified some species lacking high - copy - number centromeric tandem repeat sequences and explored the common features of these species. 5. **How does the higher - order repeat structure (HOR) of centromeric tandem repeat sequences affect its evolution?** The researchers analyzed the existence of the higher - order repeat structure and its impact on the evolution of new repeat units, especially exploring the prevalence of these structures in different species. Through these questions, the researchers hope to better understand the evolutionary mechanisms of centromeric DNA and its functional roles in different species.