Sparse principal component analysis based on genome network for correcting cell type heterogeneity in epigenome-wide association studies

Rui Miao,Qi Dang,Jie Cai,Hai-Hui Huang,Sheng-Li Xie,Yong Liang
DOI: https://doi.org/10.1007/s11517-022-02599-9
Abstract:In epigenome-wide association studies (EWAS), the mixed methylation expression caused by the combination of different cell types may lead the researchers to find the false methylation site related to the phenotype of interest. To correct the EWAS false discovery, some non-reference models based on sparse principal component analysis (sparse PCA) have been proposed. These models assume that all methylation sites have the same priori probability in each PC load. However, it is known that there already has gene network structure corresponding to the methylation site. How to integrate this genome network knowledge into the sparse PCA models to enhance the performance of existing models is an open research problem. We introduce GN-ReFAEWAS, a non-reference analysis model which integrates the prior gene network structure into the PCA framework to control the false discovery in EWAS. We used one simulated data set, three real data sets, and three additional tests for experiments and compared with four existing models. Experimental results show that the GN-ReFAEWAS model is better than the existing model by 2-90% in the indicators of sensitivity, specificity, genomic control factor λ, and correlation coefficient factor cov with known cell phenotype ratio.
What problem does this paper attempt to address?