Pattern mining across massive biological networks for functional discovery

Ting Chen,Haiyan Hu
2006-01-01
Abstract:With rapidly accumulated biological networks, it is of great importance to retrieve information from these networks. While most existing network analysis methods focus on the analysis of a single biological network that only provides static descriptions of the network features, algorithms capable of integrating massive biological networks and providing information on network dynamics are highly desirable. This thesis proposes a series of algorithms to identify patterns (connected subgraphs) across multiple biological networks to discover biologically meaningful modules. CODENSE is a novel algorithm to mine COherent and DENSE subgraphs across massive biological networks. In CODENSE, a new algorithm MODES is developed to mine overlapping dense subgraphs in a single network. CODENSE is scalable in the number and the size of the input graphs, flexible in mining either weighted or unweighted graphs, and adjustable in terms of exact or approximate pattern mining. The experimental study on 39 yeast microarray datasets demonstrates that CODENSE can discover the frequent dense subgraph patterns from multiple biological networks efficiently and effectively. Network biclustering algorithm, NETBICLUSTER, identifies condition specific activation of network modules from a large number of biological networks. The NETBICLUSTER algorithm can not only identifies network modules corresponding to particular biological processes, but also provides network dynamics information such as condition-specific activation and pathway cross-talking. Applying NETBICLUSTER to 97 mouse gene co-expression networks derived from microarray datasets, we discovered a large number of condition-specific network modules. As an application study, we applied NETBICLUSTER on 32 cancer datasets and 17 control datasets to identify network signatures for cancer. We demonstrated that NETBICLUSTER was able to find network modules related to cancer. Transcriptional study on these network signatures showed potential in clarifying regulatory mechanisms in cancer. Although finding biological meaningful network modules from large scale, noisy and heterogeneous biological networks is still a challenging problem, the success of CODENSE and NETBICLUSTER algorithm suggests that pattern mining strategy will help solve this problem and thus help biologists understand the functioning of cellular systems.
What problem does this paper attempt to address?