Abstract:Most biological networks have been proposed to possess modular organization, which increases the robustness, flexibility, and stability of networks. Many clustering methods have been used in mining biological data and partitioning complex networks into functional modules. Most of these methods require presetting the number of modules and therefore can potentially obtain biased results. The Markov clustering method (MCL) and the simulated annealing module-detection method (SA) eliminate this requirement and can objectively separate relatively dense subgraphs. In this paper, we compared these two module-detection methods for three types of biological data: protein family classification, microarray clustering, and modularity of metabolic networks. We found that these two methods show differential advantages for different biological networks. In the case of the gene network based on Affymetrix microarray spike data, MCL exactly identified the same number of groups and same contents in each group set by the spike data. In the case of the gene network derived from actual expression data, although neither of the two methods can perfectly recover the natural classification, MCL performs slightly better than SA. However, with increased random noise added to the gene expression values, SA generates better modular structures with higher modularity. Next we compared the modularization results of MCL and SA for protein family classification and found the modules detected by SA could not be well matched with the Structural Classification of Proteins (SCOP database), which suggests that MCL is ideally suited to the rapid and accurate detection of protein families. In addition, we used both methods to detect modules in the metabolic network of E. coli. MCL gives a trivial clustering, which generates biologically insignificant modules. In contrast, SA detects modules well corresponding to the KEGG functional classification. Moreover the modularity for several other metabolic networks detected by SA is also much higher than that by MCL. In summary, MCL is more suited to modularize relatively complete and definite data, such as a protein family network. In contrast, SA is less sensitive to noise such as experimental error or incomplete data and outperforms MCL when modularizing gene networks based on microarray data and large scale metabolic networks constructed from incomplete databases.

ModularBoost: an Efficient Network Inference Algorithm Based on Module Decomposition.

Using Knowledge Driven Matrix Factorization to Reconstruct Modular Gene Regulatory Network.

An Iterative Network Partition Algorithm for Accurate Identification of Dense Network Modules

Hierarchical Modular Structure Identification with Its Applications in Gene Coexpression Networks

GRNMOPT: Inference of gene regulatory networks based on a multi-objective optimization approach

A comprehensive evaluation of module detection methods for gene expression data

Inferring Gene Regulatory Networks Using the Improved Markov Blanket Discovery Algorithm

A computational framework for gene regulatory network inference that combines multiple methods and datasets

Comparison of Modularization Methods in Application to Different Biological Networks

Practical Guidelines for Incorporating Knowledge-Based and Data-Driven Strategies into the Inference of Gene Regulatory Networks

Inferring Gene Regulatory Networks Based on a Hybrid Parallel Genetic Algorithm and the Threshold Restriction Method

Inferring Gene Regulatory Networks From Single-Cell Transcriptomic Data Using Bidirectional RNN

BTNET : boosted tree based gene regulatory network inference algorithm using time-course measurement data

A novel mutual information-based Boolean network inference method from time-series gene expression data

Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM

HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model

Multiple Networks Modules Identification by a Multi-Dimensional Markov Chain Method

Inference of Gene Regulatory Network Based on Local Bayesian Networks

Inference of gene regulatory networks based on nonlinear ordinary differential equations

Supervised, semi-supervised and unsupervised inference of gene regulatory networks

Integrating data and knowledge to identify functional modules of genes: a multilayer approach