Genome-wide profiling of highly similar paralogous genes using HiFi sequencing
Xiao Chen,Daniel Baker,Egor Dolzhenko,Joseph M Devaney,Jessica Noya,April S Berlyoung,Rhonda Brandon,Kathleen S Hruska,Lucas Lochovsky,Paul Kruszka,Scott Newman,Emily Farrow,Isabelle Thiffault,Tomi Pastinen,Dalia Kasperaviciute,Christian Gilissen,Lisenka Vissers,Alexander Hoischen,Seth Berger,Eric Vilain,Emmanuele Delot,UCI Genomics Research to Elucidate the Genetics of Rare Diseases (UCI GREGoR) Consortium,Michael A Eberle
DOI: https://doi.org/10.1101/2024.04.19.590294
2024-05-16
Abstract:Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of a gene family. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 families with exceptionally low within-family diversity, where extensive gene conversion and unequal-crossing over have resulted in highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.
Genomics