Abstract:Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this question in the context of semi-streaming and massively parallel algorithms. In particular, we show that it is possible to find a $(2+\epsilon)$ approximation on randomized streams even in a single pass by using $O(n \cdot {\rm poly} \log n)$ memory on $n$-vertex graphs. Our result improves over prior works, which were designed for arbitrary-ordered streams: the algorithm by Bahmani et al. (VLDB 2012) which uses $O(\log n)$ passes, and the work by Esfandiari et al. (2015) which makes one pass but uses $O(n^{3/2})$ memory. Moreover, our techniques extend to the Massively Parallel Computation model yielding $O(1)$ rounds in the super-linear and $O(\sqrt{\log n})$ rounds in the nearly-linear memory regime. This constitutes a quadratic improvement over state-of-the-art bounds by Bahmani et al. (VLDB 2012 and WAW 2014), which require $O(\log n)$ rounds even in the super-linear memory regime. Finally, we empirically evaluate our single-pass semi-streaming algorithm on $6$ benchmarks and show that, even on non-randomly ordered streams, the quality of its output is essentially the same as that of Bahmani et al. (VLDB 2012) while it is $2$ times faster on large graphs.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of efficiently finding directed dense sub - graphs in large - scale graphs. Specifically, the researchers focus on how to quickly and effectively find dense sub - graphs in directed graphs in the semi - streaming model and the Massively Parallel Computation (MPC) environment. #### Background and Motivation 1. **Importance of Dense Sub - graphs**: - Dense sub - graph discovery is a fundamental tool in applications such as data mining, community detection, spam detection, fraud discovery, clustering, and graph compression. - For undirected graphs, there are already many effective algorithms to find dense sub - graphs, but for directed graphs, the efficiency of existing methods is low, especially when dealing with large - scale graphs. 2. **Limitations of Existing Methods**: - Existing semi - streaming algorithms either need to traverse the graph multiple times (such as Bahmani et al. [VLDB 2012]), or require a large amount of memory (such as Esfandiari et al. [2015]). - In the MPC environment, existing directed dense sub - graph algorithms require a large number of rounds, especially in the near - linear memory case. #### Main Contributions of the Paper 1. **Single - Pass Semi - streaming Algorithm**: - A single - pass semi - streaming algorithm is proposed, which can output a (2 + ε)-approximate directed dense sub - graph with high probability on a randomized stream while using only O(n·poly log n) memory. - This algorithm also performs very well on non - randomized streams, being twice as fast as Bahmani et al. [VLDB 2012] and having comparable or even higher accuracy. 2. **Improvements in the MPC Environment**: - Under the super - linear memory condition, an O(1) - round MPC algorithm is proposed. - Under the near - linear memory condition, an O(√log n) - round MPC algorithm is proposed, significantly reducing the round - complexity. 3. **Removal of the Assumption of the Optimal Ratio c**: - By guessing the c value and running the algorithm, the assumption that the optimal ratio c is known is removed, obtaining a 2(1 + ε)√δ - approximate solution. #### Summary This paper significantly improves the efficiency of finding directed dense sub - graphs in large - scale graphs by proposing new semi - streaming and MPC algorithms, especially in the case of single - pass and low - memory requirements. These improvements are of great significance for processing modern large - scale directed graphs (such as social networks, email networks, etc.).

Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs

Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs

In Search of Dense Subgraphs: How Good is Greedy Peeling?

Large very dense subgraphs in a stream of edges

Engineering Semi-streaming DFS algorithms

Efficient Algorithms for Densest Subgraph Discovery

Approximately Counting Subgraphs in Data Streams

Parallel Algorithms for Densest Subgraph Discovery Using Shared Memory Model

Estimating Descriptors for Large Graphs

Sketch-Based Anomaly Detection in Streaming Graphs

In-depth Analysis of Densest Subgraph Discovery in a Unified Framework

Buffered Streaming Edge Partitioning

A New Dynamic Algorithm for Densest Subhypergraphs

Faster Algorithms for Computing Maximal 2-Connected Subgraphs in Sparse Directed Graphs

Optimal Per-Edge Processing Times in the Semi-Streaming Model

Multiplicative Weights Update, Area Convexity and Random Coordinate Descent for Densest Subgraph Problems

Dense Subgraphs on Dynamic Networks

Faster Approximation Algorithms for Restricted Shortest Paths in Directed Graphs

A Bounded-Size Clustering Algorithm on Fully-Dynamic Streaming Graphs

High-Performance Massive Subgraph Counting Using Pipelined Adaptive-Group Communication.

BigGraphVis: Leveraging Streaming Algorithms and GPU Acceleration for Visualizing Big Graphs