Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index

Qiuyang Mang,Jingbang Chen,Hangrui Zhou,Yu Gao,Yingli Zhou,Richard Peng,Yixiang Fang,Chenhao Ma
2024-06-01
Abstract:Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the butterflies offers significant benefits across various applications, including community analysis and recommender systems. Additionally, the temporal dimension of bipartite graphs, where edges activate within specific time frames, introduces the concept of historical butterfly counting, i.e., counting butterflies within a given time interval. This temporal analysis sheds light on the dynamics and evolution of network interactions, offering new insights into their mechanisms. Despite its importance, no existing algorithm can efficiently solve the historical butterfly counting task. To address this, we design two novel indices whose memory footprints are dependent on #butterflies and #wedges, respectively. Combining these indices, we propose a graph structure-aware indexing approach that significantly reduces memory usage while preserving exceptional query speed. We theoretically prove that our approach is particularly advantageous on power-law graphs, a common characteristic of real-world bipartite graphs, by surpassing traditional complexity barriers for general graphs. Extensive experiments reveal that our query algorithms outperform existing methods by up to five magnitudes, effectively balancing speed with manageable memory requirements.
Social and Information Networks,Databases
What problem does this paper attempt to address?
The paper primarily addresses the problem of efficiently counting historical butterflies in large temporal bipartite graphs. ### Main Problems Addressed by the Paper 1. **Historical Butterfly Counting Problem**: - Counting butterfly structures (i.e., 2x2 complete bipartite graphs) within specific time periods in temporal bipartite graphs (bipartite graphs that change over time). - This counting not only helps analyze the static structure of the network but also reveals dynamic features that change over time. 2. **Limitations of Existing Algorithms**: - Existing algorithms cannot effectively handle the task of historical butterfly counting, especially on large-scale datasets. - Counting butterflies over the entire timeline may not accurately reflect the dynamic changes in network relationships. ### Solutions 1. **Proposing New Indexing Methods**: - Designed two new indexing methods, with memory consumption dependent on the number of butterflies and the number of wedges, respectively. - Combining these two indices, a graph structure-based indexing method (GSI) is proposed, which significantly reduces memory usage while ensuring query speed. 2. **Theoretical Proof**: - Proved the advantages of this method on power-law graphs (a common real-world graph model), breaking through traditional complexity barriers. 3. **Experimental Validation**: - Extensive experiments show that compared to existing methods, this method improves query efficiency by up to 5 orders of magnitude while maintaining manageable memory requirements. ### Method Overview - **Review of Related Work**: Reviewed related research on butterfly counting in static bipartite graphs, motif counting in temporal graphs, and other historical queries. - **Problem Definition**: Formally defined the historical butterfly counting problem and introduced some necessary concepts such as bipartite graphs, butterflies, and projection graphs. - **Technical Tools**: Used vertex-centric methods and Chazelle's data structure to handle 2D range counting problems. - **Algorithm Introduction**: - **Enumeration-Based Index (EBI)**: Although query time is fast, it requires a large amount of memory to store information about all butterflies. - **Combination-Based Index (CBI)**: Designed to overcome the limitations of EBI in practical applications, it does not require explicitly constructing all butterflies. - **Graph Structure-Aware Index (GSI)**: Combines the advantages of EBI and CBI, allocating data to the two indices based on the actual structure of the graph, improving query efficiency while reducing memory consumption. - **Handling Duplicate Edges**: Discussed how to handle temporal bipartite graphs with duplicate edges without sacrificing performance. In summary, this paper aims to solve the problem of historical butterfly counting in temporal bipartite graphs and achieves an efficient and practical solution by proposing a series of innovative indexing methods.