Disease category-specific annotation of variants using an ensemble learning framework

Zhen Cao,Yanting Huang,Ran Duan,Peng Jin,Zhaohui S Qin,Shihua Zhang
DOI: https://doi.org/10.1093/bib/bbab438
IF: 9.5
2021-10-13
Briefings in Bioinformatics
Abstract:Abstract Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework—CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?