Abstract:Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized.
We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a $(2+\epsilon)$ approximation on randomized streams even in a single pass by using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. Our result improves over prior works, which were designed for arbitrary-ordered streams: the algorithm by Bahmani et al. (VLDB 2012) which uses $O(\log n)$ passes, and the work by Esfandiari et al. (2015) which makes one pass but uses $O(n^{3/2})$ memory. Moreover, our techniques extend to the Massively Parallel Computation model yielding $O(1)$ rounds in the super-linear and $O(\sqrt{\log n})$ rounds in the nearly-linear memory regime. This constitutes a quadratic improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW 2014), which require $O(\log n)$ rounds even in the super-linear memory regime.
Finally, we empirically evaluate our single-pass semi-streaming algorithm on $6$ benchmarks and show that, even on non-randomly ordered streams, the quality of its output is essentially the same as that of Bahmani et al. (VLDB 2012) while it is $2$ times faster on large graphs.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of efficiently finding directed dense sub - graphs in large - scale graphs. Specifically, the researchers focus on how to quickly and effectively find dense sub - graphs in directed graphs in the semi - streaming model and the Massively Parallel Computation (MPC) environment.
#### Background and Motivation
1. **Importance of Dense Sub - graphs**:
- Dense sub - graph discovery is a fundamental tool in applications such as data mining, community detection, spam detection, fraud discovery, clustering, and graph compression.
- For undirected graphs, there are already many effective algorithms to find dense sub - graphs, but for directed graphs, the efficiency of existing methods is low, especially when dealing with large - scale graphs.
2. **Limitations of Existing Methods**:
- Existing semi - streaming algorithms either need to traverse the graph multiple times (such as Bahmani et al. [VLDB 2012]), or require a large amount of memory (such as Esfandiari et al. [2015]).
- In the MPC environment, existing directed dense sub - graph algorithms require a large number of rounds, especially in the near - linear memory case.
#### Main Contributions of the Paper
1. **Single - Pass Semi - streaming Algorithm**:
- A single - pass semi - streaming algorithm is proposed, which can output a (2 + ε)-approximate directed dense sub - graph with high probability on a randomized stream while using only O(n·poly log n) memory.
- This algorithm also performs very well on non - randomized streams, being twice as fast as Bahmani et al. [VLDB 2012] and having comparable or even higher accuracy.
2. **Improvements in the MPC Environment**:
- Under the super - linear memory condition, an O(1) - round MPC algorithm is proposed.
- Under the near - linear memory condition, an O(√log n) - round MPC algorithm is proposed, significantly reducing the round - complexity.
3. **Removal of the Assumption of the Optimal Ratio c**:
- By guessing the c value and running the algorithm, the assumption that the optimal ratio c is known is removed, obtaining a 2(1 + ε)√δ - approximate solution.
#### Summary
This paper significantly improves the efficiency of finding directed dense sub - graphs in large - scale graphs by proposing new semi - streaming and MPC algorithms, especially in the case of single - pass and low - memory requirements. These improvements are of great significance for processing modern large - scale directed graphs (such as social networks, email networks, etc.).