Uncovering Lung Cancer Risk Pathogenic Genes with Expanded Initial Node and Weighted Fusion Strategy

Wang Yi-Bin,Cheng Yong-Mei,Zhang Shao-Wu
2016-01-01
PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS
Abstract:The identification of risk pathogenic genes for lung cancer is helpful to understand disease pathogenesis and improve clinical practice. However, the present predicting methods of using RWR framework include the common problems of the less initial nodes, the same node transition probability, and the single information source. To further improve the performance of RWR framework, we propose a novel method named AFMFSC to identify disease-related genes, by enlarging the initial nodes and weighted fusion strategy, and use lung cancer as the test object. The AFMFSC algorithm first computes the augmented functional similarity scores between disease phenotype approximate genes based on the idea of augmenting fuzzy measure similarity, screens important genes as the expanded initial nodes together with pathogenic genes, then walks in the global PPI network separately guided by the node similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles, all the genes in the network are ranked by weighted fusing the above results guided by two types of transition matrices, at last the top ranked genes in the enrichment analysis as final risk pathogenic genes are determined. 73 significant genes are predicted to be the risk pathogenic genes for lung cancer, which are closely linked with the generation and development of this disease. Compared with the existing methods for prioritizing potential risk disease genes, the AFMFSC achieves a smaller average rank and less affect by degree distribution bias but bigger Top 1%, Top 5% and AUC value. In addition, the ranking performance of fusion strategy outperforms a single transfer matrix or ordinary adjacency matrix. The AFMFSC algorithm not only can accurately and effectively predict the risk pathogenic genes of lung cancer, but also can be easily extended to identify any other diseases related genes, and provide additional insights for exploring the pathogenesis of cancer.
What problem does this paper attempt to address?