Structural polymorphism and diversity of human segmental duplications

Hyeonsoo Jeong,Philip C Dishuck,DongAhn Yoo,William T Harvey,Katherine M Munson,Alexandra P Lewis,Jennifer Kordosky,Gage H Garcia,Human Genome Structural Variation Consortium (HGSVC),Feyza Yilmaz,Pille Hallast,Charles Lee,Tomi Pastinen,Evan E Eichler
DOI: https://doi.org/10.1101/2024.06.04.597452
2024-06-06
Abstract:Segmental duplications (SDs) contribute significantly to human disease, evolution, and diversity yet have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies where the majority of SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms, we identify 173.2 Mbp of duplicated sequence (47.4 Mbp not present in the telomere-to-telomere reference) distinguishing fixed from structurally polymorphic events. We find that intrachromosomal SDs are among the most variable with rare events mapping near their progenitor sequences. African genomes harbor significantly more intrachromosomal SDs and are more likely to have recently duplicated gene families with higher copy number when compared to non-African samples. A comparison to a resource of 563 million full-length Iso-Seq reads identifies 201 novel, potentially protein-coding genes corresponding to these copy number polymorphic SDs.
Genomics
What problem does this paper attempt to address?