Blood- and brain-based genome-wide association studies of smoking

Aleksandra D. Chybowska,Elena Bernabeu,Paul Yousefi,Matthew Suderman,Robert F. Hillary,Louise MacGillivray,Lee Murphy,Sarah E. Harris,Janie Corley,Archie Campbell,Tara L. Spires-Jones,Daniel L. McCartney,Simon R. Cox,Jackie F. Price,Kathryn L. Evans,Riccardo E. Marioni
DOI: https://doi.org/10.1101/2024.05.21.24307663
2024-05-21
Abstract:: Self-reported smoking is often incorporated into disease prediction tools but suffers from recall bias and does not capture passive exposure. Blood-based DNA methylation (DNAm) is an objective way to assess smoking. However, studies have not fully explored tissue-specificity or epigenome-wide coverage beyond array data. Here, we update the existing biomarkers of smoking and conduct a detailed analysis of the associations between blood DNAm and self-reported smoking. : A blood-based Bayesian epigenome-wide association study (EWAS) of smoking was carried out in 17,865 Generation Scotland individuals at ~850k CpG sites (Illumina EPIC array). For 24 pairs of smokers and non-smokers a high-resolution approach was implemented (~4 million sites, TWIST methylome panel). A DNAm-derived biomarker of smoking (mCigarette) was tested in the independent Lothian Birth Cohort 1936 (n=882, Illumina 450k array) and in the ALSPAC parents and offspring at four time points (range n=496-1,207). To explore tissue specific signals, EWASs of smoking were run across five brain regions for 14 individuals using DNAm from the EPIC array. Lastly, genome-wide association studies (GWASs) of smoking pack years and an epigenetic score for smoking (GrimAge DNAm pack years) were conducted (n=17,105). The primary EWAS analyses identified two novel genome-wide significant loci, mapping to genes related to addiction and carcinogenesis. Associations with CpG sites which are currently absent from methylation arrays were identified by the high resolution EWAS of smoking (n=48). The mCigarette pack years biomarker showed excellent discrimination across all smoking categories (current, former, never), and outperformed existing predictors in associations with pack years in an external test dataset (Pearson r=0.75). Several CpGs showed near-perfect discrimination of smoking status in both blood and brain, but these loci did not overlap across tissues. The GWAS of DNAm (but not self-reported) pack years identified novel and established smoking-related loci. However, the self-reported phenotype GWAS had a higher genetic correlation with a large meta-analysis GWAS of self-reported pack years. Among the study shortcomings are its potential lack of generalizability to non-Europeans and the absence of serum cotinine data. : A multi-tissue, multi-cohort analysis of the relationship between smoking, DNA and DNAm (assessed via arrays and targeted sequencing) has improved our understanding of the biological consequences of smoking.
What problem does this paper attempt to address?