A cystic fibrosis lung disease modifier locus harbors tandem repeats associated with gene expression
Delnaz Roshandel,Scott Mastromatteo,Cheng Wang,Jiafen Gong,Bhooma Thiruvahindrapuram,Wilson W.L. Sung,Zhuozhi Wang,Omar Hamdan,Joe Whitney,Naim Panjwani,Fan Lin,Katherine Keenan,Angela Chen,Mohsen Esmaeili,Anat Halevy,Julie Avolio,Felix Ratjen,Juan C. Celedón,Erick Forno,Wei Chen,Soyeon Kim,Lei Sun,Johanna M. Rommens,Lisa J. Strug
DOI: https://doi.org/10.1101/2022.03.28.22272580
2022-01-01
MedRxiv
Abstract:Variable number of tandem repeats (VNTRs) are major source of genetic variation in human. However due to their repetitive nature and large size, it is challenging to genotype them by short-read sequencing. Therefore, there is limited understanding of how they contribute to complex traits such as cystic fibrosis (CF) lung function. Genome-wide association study (GWAS) of CF lung disease identified two independent signals near SLC9A3 displaying a high density of VNTRs and CpG islands. Here, we used long-read (PacBio) phased sequence (N=58) to identify the boundaries and lengths of 49 common (frequency >2%) VNTRs in the region. Subsequently, associations of the VNTRs with gene expression were investigated in CF nasal epithelia using RNA sequencing (N=46). Two VNTRs tagged by the two GWAS signals and overlapping CpG islands were independently associated with SLC9A3 expression in CF nasal epithelia. The two VNTRs together explained 24% of SLC9A3 gene expression variation. One of them was also associated with TPPP expression. We then showed that the VNTR lengths can be estimated with good accuracy in short-read sequence in a subset of individuals with data on both long (PacBio) and short-read (10X Genomics) technologies (N=52). VNTR lengths were then estimated in the Genotype-Tissue Expression project (GTEx) and their association with gene expression was investigated. Both VNTRs were associated with SLC9A3 expression in multiple non-CF GTEx tissues including lung. The results confirm that VNTRs can explain substantial variation in gene expression and be responsible for GWAS signals, and highlight the critical role of long-read sequencing.
### Competing Interest Statement
The authors have declared no competing interest.
### Funding Statement
Funding for this project was provided by Cystic Fibrosis Foundation STRUG17PO; Canadian Institutes of Health Research (FRN 167282); CF Canada (2626); the Program for Individualized CF Therapy (CFIT) funded by the SickKids Foundation and CF Canada; and Natural Sciences and Engineering Research Council of Canada (RGPIN: 2015-03742, 2013-250053). This work was also funded by the Government of Canada through Genome Canada (OGI-148) and supported by a grant from the Government of Ontario. The funders of the study play no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from dbGaP accession number phs000424.v8.p2 on Sep 2020.
### Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Canadian Cystic Fibrosis Gene Modifier Study (CGMS) was approved by the Research Ethics Board (REB) of the Hospital for Sick Children (# 0020020214 from 2012-2019 and #1000065760 from 2019-present), and by the respective REBs at each of the other participating sites.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
The RNAseq and whole genome sequence data from CF-affected individuals are available to researchers for academic, non-commercial research purposes through CFIT Program (https://lab.research.sickkids.ca/cfit/sequence-data-available/).
<https://lab.research.sickkids.ca/cfit/sequence-data-available/>
<https://github.com/strug-hub/reference-polish>