A Unified and Practical Approach for Generalized Deletion Propagation

Neha Makhija,Wolfgang Gatterbauer
2024-11-27
Abstract:Deletion Propagation problems are a family of database problems that have been studied for over 40 years. They are variants of the classical view-update problem where intended tuple deletions in the view (output of a query) are propagated back to the source (input database) in a manner that obeys certain constraints while minimizing side effects. Problems from this family have been used in domains as diverse as GDPR compliance, effective SQL pedagogy, and query explanations. However, so far these variants, their complexity, and practical algorithms have always been studied in isolation. In this paper, we unify the Deletion Propagation (DP) in a single generalized framework that comes with several appealing benefits: (1) Our approach not only captures all prior deletion propagation variants but also introduces a whole family of new and well-motivated problems. (2) Our algorithmic solution is general and practical. It solves problems `course-grained instance-optimally', i.e., our algorithm is not only guaranteed to terminate in polynomial time (PTIME) for all currently known PTIME cases, it can also leverage regularities in the data without explicitly receiving them as input (knowing about certain structural properties in data is often a prerequisite for a specialized algorithm to be applicable). (3) At the same time, our approach is not only practical (easy-to-implement), it is also competitive with (and at times faster by orders of magnitude than) prior PTIME approaches specialized for each problem. For variants of the problem that have been studied only theoretically so far, we show the first experimental results. (4) Our approach is complete. It can solve all problem variants and covers all settings (even those that have been previously notoriously difficult to study, such as queries with self-joins, unions, and bag semantics), and it also allows us to provide new complexity results.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the **unified and practical Generalized Deletion Propagation (GDP) framework** to meet the challenges of the existing Deletion Propagation (DP) problems. Specifically, the paper aims to: 1. **Unify existing DP variants**: Existing DP problems and their variants have been widely studied over the past 40 - plus years, but these variants are usually studied in isolation, lacking a unified framework to cover all variants. The paper proposes a generalized deletion propagation framework (GDP), which not only covers all known DP variants but also introduces new and reasonable DP problems. 2. **Provide a general and practical algorithm**: Existing DP variants usually require the design of specialized algorithms for each specific problem, resulting in algorithmic diversity and complexity. The paper proposes a general Integer Linear Programming (ILP) formula that can solve all known PTIME cases in polynomial time and is competitive in practical applications. 3. **Handle complex queries and semantics**: Existing DP research mainly focuses on self - join - free conjunctive queries and set semantics, while queries in practical applications often contain unions, self - joins, and bag semantics. The GDP framework proposed in the paper can handle these more complex query and semantic settings. ### Specific Problem Description #### Challenge 1: Innumerable Reasonable Variants The DP problem has existed in various forms for more than 40 years, but many unstudied reasonable variants can still be imagined. These variants may originate from different side - effect definitions, allowed side - effect constraints, and different optimization goals. For example, Example 1 in the paper shows that an airline hopes to reduce the number of flights to cut costs while minimizing the impact on network connectivity. This problem combines Aggregate Deletion Propagation (ADP) and the Smallest Witness Problem (SWP) and extends to other aspects. #### Challenge 2: Different Algorithms for Similar Problems Since DP variants are usually studied in isolation, the algorithms used to solve these problems also vary. Even for the same variant, different queries require different algorithms. This results in new variants usually being solved from scratch and unable to share algorithmic insights. The paper solves this problem by proposing a unified "coarse - grained instance - optimal" framework. #### Challenge 3: Unknown Algorithms and Solvability Criteria for Practical Queries and Scenarios Existing DP problems are usually only studied for self - join - free conjunctive queries and set semantics because queries containing self - joins are difficult to analyze and the complexity boundaries are not fully understood. However, queries in practical applications often contain unions, self - joins, and are executed under bag semantics. The paper fills this gap by proposing an ILP formula that can handle these complex queries. ### Main Contributions of the Paper 1. **Define Generalized Deletion Propagation (GDP)**: GDP not only covers all known DP variants but also includes the Smallest Witness Problem (SWP) and other new reasonable variants. 2. **Propose a unified ILP formula**: Through an ILP formula, all DP variants can be solved, and all known PTIME cases can be solved in polynomial time. 3. **Discover new solvable cases**: Prove that certain queries containing unions and self - joins can be solved in polynomial time by the ILP formula under bag semantics. 4. **Experimental verification**: Through experimental evaluation, it is proved that the proposed method is as efficient as or even better than existing specialized algorithms and can solve previously unknown solvable cases. ### Summary By proposing the Generalized Deletion Propagation (GDP) framework, this paper unifies the existing deletion propagation problems and their variants and provides a general and efficient solution applicable to multiple complex query and semantic settings.