Abstract:Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams.We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements of space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.

An Algorithmic View of Streaming Submodular Data Summarization with A Knapsack Constraint

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Fairness in Streaming Submodular Maximization Subject to a Knapsack Constraint

Fast Algorithm for Big Data Summarization with Knapsack and Partition Matroid Constraints

Streaming Algorithms for Constrained Submodular Maximization

Scube: Efficient Summarization for Skewed Graph Streams

Streaming Algorithms for Robust Submodular Maximization

Construction of summary structures from sliding windows over data streams

New Sampling-Based Summary Structures for Sliding Windows over Data Streams

Streaming Algorithms for Maximizing k-Submodular Functions with the Multi-knapsack Constraint

Streaming Algorithm for Maximizing a Monotone Non-Submodular Function under D-Knapsack Constraint

MRST—An Efficient Monitoring Technology of Summarization on Stream Data

Improved Deterministic Streaming Algorithms for Non-monotone Submodular Maximization

Summarization and Matching of Density-Based Clusters in Streaming Environments

Submodular Optimization over Streams with Inhomogeneous Decays

Non-submodular Maximization on Massive Data Streams.

Video Summarization Via Simultaneous Block Sparse Representation.

Construction of synopsis for periodically updating sliding windows over data streams

Video Summarization Via Block Sparse Dictionary Selection

Thresholding Methods for Streaming Submodular Maximization with a Cardinality Constraint and Its Variants

Streaming Submodular Maximization under Noises