Abstract:Abstract Background Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. Bipartite graphs can be useful for representing relationships across pairs of disparate data types, with the interpretation of these relationships accomplished through an enumeration of maximal bicliques. Most previously-known techniques are generally ill-suited to this foundational task, because they are relatively inefficient and without effective scaling. In this paper, a powerful new algorithm is described that produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the new method neither places undue restrictions on its input nor inflates the problem size. Efficiency is achieved through an innovative exploitation of bipartite graph structure, and through computational reductions that rapidly eliminate non-maximal candidates from the search space. An iterative selection of vertices for consideration based on non-decreasing common neighborhood sizes boosts efficiency and leads to more balanced recursion trees. Results The new technique is implemented and compared to previously published approaches from graph theory and data mining. Formal time and space bounds are derived. Experiments are performed on both random graphs and graphs constructed from functional genomics data. It is shown that the new method substantially outperforms the best previous alternatives. Conclusions The new method is streamlined, efficient, and particularly well-suited to the study of huge and diverse biological data. A robust implementation has been incorporated into GeneWeaver, an online tool for integrating and analyzing functional genomics experiments, available at http://geneweaver.org . The enormous increase in scalability it provides empowers users to study complex and previously unassailable gene-set associations between genes and their biological functions in a hierarchical fashion and on a genome-wide scale. This practical computational resource is adaptable to almost any applications environment in which bipartite graphs can be used to model relationships between pairs of heterogeneous entities.

Enumerating Maximal Bicliques from a Large Graph using MapReduce

A Parallel Computing Model for Large-Graph Mining with MapReduce.

Clique counting in MapReduce: theory and experiments

BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types

Efficiently extracting frequent subgraphs using MapReduce

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Identifying similar-bicliques in bipartite graphs

Efficient Maximal Biclique Enumeration on Large Signed Bipartite Graphs

Enumeration of Billions of Maximal Bicliques in Bipartite Graphs Without Using GPUs

Distributed structural clustering on large graph

AMBEA: Aggressive Maximal Biclique Enumeration in Large Bipartite Graph Computing

Enumerating Top-k Quasi-Cliques

Distributed Centrality Analysis of Social Network Data Using MapReduce

Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach

Parallel Clique-Like Subgraph Counting And Listing

Large-Scale Social Network Analysis Based on MapReduce

New advances in enumerative biclustering algorithms with online partitioning

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Accelerating Maximal Bicliques Enumeration with GPU on Large Scale Network

Maximum Biplex Search over Bipartite Graphs