Complexes Detection in Biological Networks via Diversified Dense Subgraphs Mining

Xiuli Ma,Guangyu Zhou,Jingjing Wang,Jian Peng,Jiawei Han
DOI: https://doi.org/10.48550/arXiv.1604.03244
2016-04-12
Molecular Networks
Abstract:Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interacting patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform the job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. Most existing methods apply efficient graph clustering algorithms on PPI networks, often failing to detect possible densely connected subgraphs and overlapped subgraphs. Besides clustering-based methods, dense subgraph enumeration methods have also been used, which aim to find all densely connected protein sets. However, such methods are not practically tractable even on a small yeast PPI network, due to high computational complexity. In this paper, we introduce a novel approximate algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that we do not need to enumerate all dense subgraphs. Instead we only need to find a small subset of subgraphs that cover as many proteins as possible. The problem is formulated as finding a diverse set of dense subgraphs, where we develop highly effective pruning techniques to guarantee efficiency. To handle large networks, we take a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several human and yeast PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.
What problem does this paper attempt to address?