De novo assembly of 64 haplotype-resolved human genomes of diverse ancestry and integrated analysis of structural variation
Peter Ebert,Peter A. Audano,Qihui Zhu,Bernardo Rodriguez-Martin,David Porubsky,Marc Jan Bonder,Arvis Sulovari,Jana Ebler,Weichen Zhou,Rebecca Serra Mari,Feyza Yilmaz,Xuefang Zhao,PingHsun Hsieh,Joyce Lee,Sushant Kumar,Jiadong Lin,Tobias Rausch,Yu Chen,Jingwen Ren,Martin Santamarina,Wolfram Höps,Hufsah Ashraf,Nelson T. Chuang,Xiaofei Yang,Katherine M. Munson,Alexandra P. Lewis,Susan Fairley,Luke J. Tallon,Wayne E. Clarke,Anna O. Basile,Marta Byrska-Bishop,André Corvelo,Mark J.P. Chaisson,Junjie Chen,Chong Li,Harrison Brand,Aaron M. Wenger,Maryam Ghareghani,William T. Harvey,Benjamin Raeder,Patrick Hasenfeld,Allison Regier,Haley Abel,Ira Hall,Paul Flicek,Oliver Stegle,Mark B. Gerstein,Jose M.C. Tubio,Zepeng Mu,Yang I. Li,Xinghua Shi,Alex R. Hastie,Kai Ye,Zechen Chong,Ashley D. Sanders,Michael C. Zody,Michael E. Talkowski,Ryan E. Mills,Scott E. Devine,Charles Lee,Jan O. Korbel,Tobias Marschall,Evan E. Eichler
DOI: https://doi.org/10.1101/2020.12.16.423102
IF: 56.9
2020-01-01
Science
Abstract:Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation across even complex loci such as the major histocompatibility complex. We focus on 107,590 structural variants (SVs), of which 68% are inaccessible by short-read sequencing. We identify new SV hotspots (spanning megabases of gene-rich sequence), characterize 130 of the most active mobile element source elements, and find that 63% of all SVs arise by homology-mediated mechanisms—a twofold increase from previous studies. Our resource now enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,525 expression quantitative trait loci (SV-eQTLs) as well as SV candidates for adaptive selection within the human population.
### Competing Interest Statement
A.R.H. and J.L. are employees and shareholders of Bionano Genomics. A.M.W. is an employee and shareholder of Pacific Biosciences.