SCIA: A Novel Gene Set Analysis Applicable to Data with Different Characteristics

Yiqun Li,Ying Wu,Xiaohan Zhang,Yunfan Bai,Luqman Muhammad Akthar,Xin Lu,Ming Shi,Jianxiang Zhao,Qinghua Jiang,Yu Li
DOI: https://doi.org/10.3389/fgene.2019.00598
IF: 3.7
2019-01-01
Frontiers in Genetics
Abstract:Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.
What problem does this paper attempt to address?