Pangenome graph construction from genome alignments with Minigraph-Cactus
Glenn Hickey,Jean Monlong,Jana Ebler,Adam M. Novak,Jordan M. Eizenga,Yan Gao,Tobias Marschall,Heng Li,Benedict Paten,Haley J. Abel,Lucinda L. Antonacci-Fulton,Mobin Asri,Gunjan Baid,Carl A. Baker,Anastasiya Belyaeva,Konstantinos Billis,Guillaume Bourque,Silvia Buonaiuto,Andrew Carroll,Mark J. P. Chaisson,Pi-Chuan Chang,Xian H. Chang,Haoyu Cheng,Justin Chu,Sarah Cody,Vincenza Colonna,Daniel E. Cook,Robert M. Cook-Deegan,Omar E. Cornejo,Mark Diekhans,Daniel Doerr,Peter Ebert,Evan E. Eichler,Susan Fairley,Olivier Fedrigo,Adam L. Felsenfeld,Xiaowen Feng,Christian Fischer,Paul Flicek,Giulio Formenti,Adam Frankish,Robert S. Fulton,Shilpa Garg,Erik Garrison,Nanibaa’ A. Garrison,Carlos Garcia Giron,Richard E. Green,Cristian Groza,Andrea Guarracino,Leanne Haggerty,Ira M. Hall,William T. Harvey,Marina Haukness,David Haussler,Simon Heumos,Kendra Hoekzema,Thibaut Hourlier,Kerstin Howe,Miten Jain,Erich D. Jarvis,Hanlee P. Ji,Eimear E. Kenny,Barbara A. Koenig,Alexey Kolesnikov,Jan O. Korbel,Jennifer Kordosky,Sergey Koren,HoJoon Lee,Alexandra P. Lewis,Wen-Wei Liao,Shuangjia Lu,Tsung-Yu Lu,Julian K. Lucas,Hugo Magalhães,Santiago Marco-Sola,Pierre Marijon,Charles Markello,Fergal J. Martin,Ann McCartney,Jennifer McDaniel,Karen H. Miga,Matthew W. Mitchell,Jacquelyn Mountcastle,Katherine M. Munson,Moses Njagi Mwaniki,Maria Nattestad,Sergey Nurk,Hugh E. Olsen,Nathan D. Olson,Trevor Pesout,Adam M. Phillippy,Alice B. Popejoy,David Porubsky,Pjotr Prins,Daniela Puiu,Mikko Rautiainen,Allison A. Regier,Arang Rhie,Samuel Sacco,Ashley D. Sanders,Valerie A. Schneider,Baergen I. Schultz,Kishwar Shafin,Jonas A. Sibbesen,Jouni Sirén,Michael W. Smith,Heidi J. Sofia,Ahmad N. Abou Tayoun,Françoise Thibaud-Nissen,Chad Tomlinson,Francesca Floriana Tricomi,Flavia Villani,Mitchell R. Vollger,Justin Wagner,Brian Walenz,Ting Wang,Jonathan M. D. Wood,Aleksey V. Zimin,Justin M. Zook,Human Pangenome Reference Consortium
DOI: https://doi.org/10.1038/s41587-023-01793-w
IF: 46.9
2023-05-11
Nature Biotechnology
Abstract:Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
biotechnology & applied microbiology