Remarkably High Repeat Content in the Genomes of Sparrows: The Importance of Genome Assembly Completeness for Transposable Element Discovery

Phred M Benham,Carla Cicero,Merly Escalona,Eric Beraut,Colin Fairbairn,Mohan P A Marimuthu,Oanh Nguyen,Ruta Sahasrabudhe,Benjamin L King,W Kelley Thomas,Adrienne I Kovach,Michael W Nachman,Rauri C K Bowie
DOI: https://doi.org/10.1093/gbe/evae067
2024-04-01
Genome Biology and Evolution
Abstract:Abstract Transposable elements (TE) play critical roles in shaping genome evolution. Highly repetitive TE sequences are also a major source of assembly gaps making it difficult to fully understand the impact of these elements on host genomes. The increased capacity of long-read sequencing technologies to span highly repetitive regions promises to provide new insights into patterns of TE activity across diverse taxa. Here we report the generation of highly contiguous reference genomes using PacBio long-read and Omni-C technologies for three species of Passerellidae sparrow. We compared these assemblies to three chromosome-level sparrow assemblies and nine other sparrow assemblies generated using a variety of short- and long-read technologies. All long-read based assemblies were longer (range: 1.12 to 1.41 Gb) than short-read assemblies (0.91 to 1.08 Gb) and assembly length was strongly correlated with the amount of repeat content. Repeat content for Bell's sparrow (31.2% of genome) was the highest level ever reported within the order Passeriformes, which comprises over half of avian diversity. The highest levels of repeat content (79.2% to 93.7%) were found on the W chromosome relative to other regions of the genome. Finally, we show that proliferation of different TE classes varied even among species with similar levels of repeat content. These patterns support a dynamic model of TE expansion and contraction even in a clade where TEs were once thought to be fairly depauperate and static. Our work highlights how the resolution of difficult-to-assemble regions of the genome with new sequencing technologies promises to transform our understanding of avian genome evolution.
genetics & heredity,evolutionary biology
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **The impact of genome assembly integrity on transposable element (TE) detection**: Researchers explore the effects of different sequencing technologies (long-read and short-read sequencing) on genome assembly quality and transposable element detection. Long-read sequencing technology can better span highly repetitive regions, thereby improving the continuity and integrity of genome assembly, which helps in more accurate detection of transposable elements. 2. **Dynamic changes of transposable elements in the genomes of sparrow species**: By generating and analyzing high-continuity reference genomes of multiple sparrow species, researchers aim to reveal the history of transposable element amplification and deletion in these species, as well as the differences in transposable element dynamics between different species. 3. **High repetitive sequences in the W chromosome**: The study particularly focuses on the high repetitive sequences on the W chromosome, as these regions are often difficult to resolve in genome assembly. The research found that the content of repetitive sequences on the W chromosome is significantly higher than in other chromosomal regions, especially in some sparrow species where repetitive sequences on the W chromosome account for more than 90%. 4. **The impact of different sequencing technologies on genome size estimation**: By comparing genome assemblies generated by different sequencing technologies, researchers evaluated the accuracy of these technologies in estimating genome size. The results show that genome assemblies generated by long-read sequencing technology are closer to the actual estimated genome size. In summary, this paper aims to reveal the complex dynamics of transposable elements in the genomes of sparrow species through high-continuity genome assembly, and the impact of different sequencing technologies on the detection of these dynamics. These findings are significant for understanding the evolution of bird genomes.