EpiSmokEr: A robust classifier to determine smoking status from DNA methylation data

Sailalitha Bollepalli,Tellervo Korhonen,Jaakko Kaprio,Miina Ollikainen,Simon Anders
DOI: https://doi.org/10.1101/487975
2018-12-06
Abstract:Abstract Self-reported smoking status is prone to misclassification due to under-reporting, while biomarkers like cotinine can only measure recent exposure. Smoking strongly influences DNA methylation, with current, former and never smokers exhibiting different methylation profiles. Recently, two approaches were proposed to calculate scores based on smoking-responsive DNA methylation loci, to serve as reliable indicators of long-term exposure and potential biomarkers to estimate smoking behavior. However, these two methodologies need significant improvements to make them globally applicable to all populations and to achieve an optimal classification of individuals with unknown smoking habits. To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. We show the prediction performance of our classifier on three independent whole-blood test datasets demonstrating its robustness and global applicability. Furthermore, we show the cross-tissue functionality of our classifier in tissues other than blood. Additionally, we provide the community with an R package, EpiSmokEr , facilitating implementation of our classifier to predict smoking status in future studies.
What problem does this paper attempt to address?