Explainable models using transcription factor binding and epigenome patterns at promoters reveal disease-associated genes and their regulators in the context of cell-types

Omkar Chandra,Durjay Pramanik,Srishti Gautam,Madhu Sharma,Niharika Dubey,Biswarup Mahato,Yuriy L Orlov,Vibhor Kumar
DOI: https://doi.org/10.1101/2024.05.06.592622
2024-10-30
Abstract:Understanding genome-wide epigenetic regulation of diseases is important in establishing pathogenic factors and could aid in disease diagnosis, prognosis, and therapeutics. In this study, we have utilized transcription factors (TFs) and co-factor profiles (n=823) as features in machine learning models to link them to various diseases. Further, along with TFs and co-factor profiles, histone modifications ChIP-seq (n = 621), cap analysis gene expression (CAGE) tags (n = 255), and DNase hypersensitivity profiles (n = 255) as features allowed for the modeling of association of coding and non-coding genes to diseases. Such predicted associations could be independently validated using genome-wide association data and survival analysis. However, the unique aspect of our approach is that it highlights the link between TF binding patterns and diseases in the context of cell types. Besides highlighting relevant TF-binding in known cell-types associated with diseases, it also provided their surprising link with TFs expressed in immune cells and other seemingly non-related cells. Further investigation revealed such links to be genuine and potentially useful for prognosis, further revealing the need to deconvolve a set of known genes associated with diseases.
Bioinformatics
What problem does this paper attempt to address?