Abstract:In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This problem (along with its variants) was introduced by Bhattacherjee et al. [VLDB'15]. While such problems are frequently encountered in collaborative tools (e.g., version control systems and data analysis pipelines), to the best of our knowledge, no existing research studies the theoretical aspects of these problems. We establish that the currently best-known heuristic, LMG, can perform arbitrarily badly in a simple worst case. Moreover, we show that it is hard to get $o(n)$-approximation for MSR on general graphs even if we relax the storage constraints by an $O(\log n)$ factor. Similar hardness results are shown for other variants. Meanwhile, we propose poly-time approximation schemes for tree-like graphs, motivated by the fact that the graphs arising in practice from typical edit operations are often not arbitrary. As version graphs typically have low treewidth, we further develop new algorithms for bounded treewidth graphs. Furthermore, we propose two new heuristics and evaluate them empirically. First, we extend LMG by considering more potential ``moves'', to propose a new heuristic LMG-All. LMG-All consistently outperforms LMG while having comparable run time on a wide variety of datasets, i.e., version graphs. Secondly, we apply our tree algorithms on the minimum-storage arborescence of an instance, yielding algorithms that are qualitatively better than all previous heuristics for MSR, as well as for another variant BoundedMin Retrieval (BMR).

On Graph Deltas for Historical Queries

Durable Queries over Historical Time Series

AeonG: An Efficient Built-in Temporal Support in Graph Databases

Evaluating Continuous Basic Graph Patterns over Dynamic Link Data Graphs

Incremental View Maintenance for Deductive Graph Databases Using Generalized Discrimination Networks

Suitability of Graph Database Technology for the Analysis of Spatio-Temporal Data

To Store or Not to Store: a graph theoretical approach for Dataset Versioning

Join Processing for Graph Patterns: An Old Dog with New Tricks

Towards Temporal Graph Databases

Localized RETE for Incremental Graph Queries

Keyword Search on Temporal Graphs.

Achieving Sub-second Pairwise Query over Evolving Graphs.

Analytic Queries over Geospatial Time-Series Data Using Distributed Hash Tables

Graph versioning for evolving urban data

Continuous Queries for Multi-Relational Graphs

Optimizing Navigational Graph Queries

Efficient Computation of Distance Labeling for Decremental Updates in Large Dynamic Graphs.

The GraphTempo Framework for Exploring the Evolution of a Graph through Pattern Aggregation

Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent Queries

Differentially Private Algorithms for Graphs Under Continual Observation

Graph Generation via Reverse Iterative Query Processing