Impact and characterization of serial structural variations across humans and great apes

Wolfram Höps,Tobias Rausch,Michael Jendrusch,Hufsah Ashraf,Peter A. Audano,Ola Austine,Anna O. Basile,Christine R. Beck,Marc Jan Bonder,Marta Byrska-Bishop,Mark J. P. Chaisson,Zechen Chong,André Corvelo,Scott E. Devine,Peter Ebert,Jana Ebler,Evan E. Eichler,Mark B. Gerstein,Pille Hallast,William T. Harvey,Patrick Hasenfeld,Alex R. Hastie,Mir Henglin,Kendra Hoekzema,PingHsun Hsieh,Sarah Hunt,Miriam K. Konkel,Jennifer Kordosky,Peter M. Lansdorp,Charles Lee,Wan-Ping Lee,Alexandra P. Lewis,Chong Li,Jiadong Lin,Mark Loftus,Glennis A. Logsdon,Tobias Marschall,Ryan E. Mills,Yulia Mostovoy,Katherine M. Munson,Giuseppe Narzisi,Andy Pang,David Porubsky,Timofey Prodanov,Bernardo Rodriguez-Martin,Xinghua Shi,Likhitha Surapaneni,Michael E. Talkowski,Feyza Yilmaz,DongAhn Yoo,Weichen Zhou,Michael C. Zody,Jan O. Korbel,Fritz J. Sedlazeck
DOI: https://doi.org/10.1038/s41467-024-52027-9
IF: 16.6
2024-09-14
Nature Communications
Abstract:Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals (https://github.com/WHops/NAHRwhals), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
multidisciplinary sciences
What problem does this paper attempt to address?