Structurally divergent and recurrently mutated regions of primate genomes

Yafei Mao,William T Harvey,David Porubsky,Katherine M Munson,Kendra Hoekzema,Alexandra P Lewis,Peter A Audano,Allison Rozanski,Xiangyu Yang,Shilong Zhang,DongAhn Yoo,David S Gordon,Tyler Fair,Xiaoxi Wei,Glennis A Logsdon,Marina Haukness,Philip C Dishuck,Hyeonsoo Jeong,Ricardo Del Rosario,Vanessa L Bauer,Will T Fattor,Gregory K Wilkerson,Yuxiang Mao,Yongyong Shi,Qiang Sun,Qing Lu,Benedict Paten,Trygve E Bakken,Alex A Pollen,Guoping Feng,Sara L Sawyer,Wesley C Warren,Lucia Carbone,Evan E Eichler,William T. Harvey,Katherine M. Munson,Alexandra P. Lewis,Peter A. Audano,David S. Gordon,Glennis A. Logsdon,Philip C. Dishuck,Ricardo del Rosario,Vanessa L. Bauer,Will T. Fattor,Gregory K. Wilkerson,Trygve E. Bakken,Alex A. Pollen,Sara L. Sawyer,Wesley C. Warren,Evan E. Eichler
DOI: https://doi.org/10.1016/j.cell.2024.01.052
2024-03-02
Cell Journal
Abstract:Highlights • Long-read sequence assembly of eight primate genomes • Atlas of lineage-specific and recurrent structural variation • Structurally divergent regions (SDRs) associate with lineage-specific genes • Recurrent duplications diversify primate genes and predispose to human disease Summary We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD , C4 , and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2 , VPS36 , ACBD7 , and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species. Graphical abstract Download : Download high-res image (329KB) Download : Download full-size image
cell biology
What problem does this paper attempt to address?