Comprehensive Structural Variant Haplotype Panel of 943 Han Chinese from Long-Read Whole-Genome Sequencing

Tingting Gong,Yulu Zhou,Yechao Huang,Junfan Zhao,Jiao Gong,Jinxi Li,Qianqian Peng,Sijia Wang,Li Jin,Shaohua Fan
DOI: https://doi.org/10.21203/rs.3.rs-5343403/v1
2024-01-01
Abstract:Background Structural variations (SVs) are important genomic elements in evolution and disease, yet they remain underutilized in genome-wide association studies (GWAS) due to technical challenges and high cost in their detection and genotyping. Results We developed a comprehensive haplotype reference panel incorporating over 35 million variants, including 172,569 SVs, from 943 Han Chinese individuals. Our novel hybrid phasing approach, combining long-read-based and statistical methods, achieved phasing accuracy in unrelated individuals comparable to trio-based phasing, and significantly reduced error rates for both small variants and SVs compared to conventional statistical phasing. The panel enabled a four-fold improvement in high-quality SV imputation and 31% higher SV imputation sensitivity compared to the expanded 1000 Genomes Project panel. GWAS analysis incorporating SVs identified 37 independent SV signals and 99 previously unreported regions across 62 skin-related phenotypes, demonstrating superior performance over GWAS using only short-read sequencing variants. Further analysis using our panel-imputed variants revealed two significantly associated SVs and two novel regions for fingerprint phenotypes, expanding upon findings from the original study that used the 1000 Genomes Project reference panel. Conclusions This study presents a comprehensive SV-enriched haplotype reference panel and demonstrates the value of including SVs in GWAS for understanding the genetic architecture of complex traits and diseases.
What problem does this paper attempt to address?