Abstract:Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation status is rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Because of this, the extent to which these regions act as quantitative-trait loci is also largely unknown. Here, we generated a pangenome graph summarizing the SVs in 782 de novo assembled genomes obtained from the Genomic Answers for Kids rare disease cohort, that captures 14.6 million CpGs in DNA segments that are absent from the CHM13v2 assembly (SV-CpGs), expanding their number by 43.6%. Next, using 435 methylomes from the same samples, we genotyped a total of 7.99 million SV-CpGs, of which 5.18 million (64.8%) were found to be methylated (SV-5mCpGs) in at least one sample. To understand the provenance and impact of these novel SV-CpGs, we noted that non-repeat sequences were the leading contributor of SV-CpGs (3.3 × 10 ), followed by centromeric satellites (1.58 × 10 ), simple repeats (1.19 × 10 ), Alus (0.67 × 10 ), satellites (0.39 × 10 ), L1s (0.27 × 10 ), and SVAs (0.19 × 10 ). Meanwhile, the methylation rate of SV-CpGs was the highest in repeat sequences. Moreover, in contrast to Alus and L1s, centromeric satellites, simple repeats and SVA sequences were overrepresented in SV-5mCpGs compared to reference CpGs. Similarly, we established that non-reference CpGs were more than twice (37% vs. 15%) as likely to be variable, showing intermediate methylation levels in the population. Lastly, to explore if SVs detected in this pangenome are potentially causal for functional variation in population we measured methylation quantitative trait loci (SV-mQTLs) using CHM13v2 as a backbone. This revealed over 230,464 methylation bins within 100 kbp of a common SV (>5% MAF) showing significant association (at 5% FDR) with methylation variation. Finally, we assessed how many of these SVs-mQTLs were the leading QTL variant compared to SNVs and identified 65,659 methylation bins (28.5%) where the leading variant was an SV. In conclusion, our results demonstrate that graph genome references providing full SV structures in combination with the associated methylation variation reveal tens-of-thousands of QTLs that are more accurately mapped in personal genomes, underscoring the importance of assembly-based analyses of human traits.

Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores

Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease

Diagnostic and Clinical Utility of Whole Genome Sequencing in a Cohort of Undiagnosed Chinese Families with Rare Diseases

Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes

Pangenome graphs improve the analysis of structural variants in rare genetic diseases

Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis

Structural variant calling and clinical interpretation in 6224 unsolved rare disease exomes

Complex trait associations in rare diseases and impacts on Mendelian variant interpretation

Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders

Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection

RetroFun-RVS: a retrospective family-based framework for rare variant analysis incorporating functional annotations

Deciphering the role of germline complex structural variations in rare disorders

Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA

Full characterization of unresolved structural variation through long-read sequencing and optical genome mapping

Variant-to-function dissection of rare non-coding GWAS loci with high impact on blood traits

Whole genome sequencing for copy number variant detection to improve diagnosis and management of rare diseases

Inferring disease risk genes from sequencing data in multiplex pedigrees through sharing of rare variants

Rare coding variant analysis for human diseases across biobanks and ancestries

Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders

Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project