High Quality Phasing Using Linked-Read Whole Genome Sequencing of Patient Cohorts Informs Genetic Understanding of Complex Traits
Scott Mastromatteo,Angela Chen,Jiafen Gong,Fan Lin,Bhooma Thiruvahindrapuram,Wilson WL Sung,Joe Whitney,Zhuozhi Wang,Rohan V Patel,Katherine Keenan,Anat Halevy,Naim Panjwani,Julie Avolio,Cheng Wang,Guillaume Côté-Maurais,Stéphanie Bégin,Damien Adam,Emmanuelle Brochiero,Candice Bjornson,Mark Chilvers,April Price,Michael Parkins,Richard van Wylick,Dimas Mateos-Corral,Daniel Hughes,Mary Jane Smith,Nancy Morrison,Elizabeth Tullis,Anne L Stephenson,Pearce Wilcox,Bradley S Quon,Winnie M Leung,Melinda Solomon,Lei Sun,Felix Ratjen,Lisa J Strug
DOI: https://doi.org/10.1101/2022.03.28.486092
2022-01-01
bioRxiv
Abstract:Phasing of heterozygous alleles is critical for interpretation of cis -effects of disease-relevant variation. For population studies, phase is often inferred from external data but read-based phasing approaches that span long genomic distances would be more accurate because they enable both genotype and phase to be obtained from a single dataset. To demonstrate how read-based phasing can provide functional insights, we sequenced 477 individuals with Cystic Fibrosis (CF) using linked-read sequencing. We benchmark read-based phasing with different short- and long-read sequencing technologies, prioritize linked-read technology as the most informative and produce a benchmark phase call set from reference sample HG002 for the community. The 477 samples display an average phase block N50 of 4.39 Mb. We use these samples to construct a graph representation of CFTR haplotypes, which facilitates understanding of complex CF alleles. Fine-mapping and phasing of the chr7q35 trypsinogen locus associated with CF meconium ileus demonstrates a 20 kb deletion and a PRSS2 missense variant p.Thr8Ile (rs62473563) independently contribute to meconium ileus risk (p=0.0028, p=0.011, respectively) and are PRSS2 pancreas eQTLs (p=9.5e-7 and p=1.4e-4, respectively), explaining the mechanism by which these polymorphisms contribute to CF. Phase enables access to haplotypes that can be used for genome graph or reference panel construction, identification of cis -effects, and for understanding disease associated loci. The phase information from linked-reads provides a causal explanation for variation at a CF-relevant locus which also has implications for the genetic basis of non-CF pancreatitis to which this locus has been reported to contribute.
### Competing Interest Statement
DMC received an honorarium for teaching module development for Vertex Pharmaceuticals. NM is doing contract research trials for Vertex Phaemaceuticals and Abbvie. ALS has received speaking fees for educational programs sponsored by Vertex Pharmaceuticals. BSQ has received speaker fees from Vertex Pharmaceuticals and has served as site PI for several Vertex-sponsored clinical trials. WML is a study investigator for Vertex Pharmaceuticals. ET and FR act as consultants for Vertex Pharmaceuticals. MS participated in Vertex clinical trials and received payment for education modules. SM, AC, JG, FL, BT, WWLS, JW, ZW, RVP, KK, AH, NP, JA, CW, GCM, SB, DA, EB, CB, MC, AP, MP, RVW, DH, MJS, ET, PW, LS, FR, and LJS have no conflicts of interest.
* CF
: cystic fibrosis
CFTR
: cystic fibrosis transmembrane conductance
WGS
: whole genome sequencing
GWAS
: genome-wide association studies
MI
: meconium ileus
LD
: linkage disequilibrium
PacBio
: Pacific Biosciences
10XG
: 10x Genomics
CGMS
: Canadian CF Gene Modifier Study Consortium
CLR
: PacBio continuous long-reads
CCS
: PacBio circular consensus sequence
VCF
: variant call format
GIAB
: Genome in a Bottle
HMW
: high molecular weight
TCAG
: The Centre for Applied Genomics
GTEx
: Genotype-Tissue Expression
ER
: endoplasmic reticulum
SRP
: signal recognition particle
QC
: quality control