SPECIAL: Synopsis Assisted Secure Collaborative Analytics

Chenghong Wang,Lina Qiu,Johes Bater,Yukui Luo
2024-04-29
Abstract:Secure collaborative analytics (SCA) enable the processing of analytical SQL queries across multiple owners' data, even when direct data sharing is not feasible. Although essential for strong privacy, the large overhead from data-oblivious primitives in traditional SCA has hindered its practical adoption. Recent SCA variants that permit controlled leakages under differential privacy (DP) show a better balance between privacy and efficiency. However, they still face significant challenges, such as potentially unbounded privacy loss, suboptimal query planning, and lossy processing. To address these challenges, we introduce SPECIAL, the first SCA system that simultaneously ensures bounded privacy loss, advanced query planning, and lossless processing. SPECIAL employs a novel synopsis-assisted secure processing model, where a one-time privacy cost is spent to acquire private synopses (table statistics) from owner data. These synopses then allow SPECIAL to estimate (compaction) sizes for secure operations (e.g., filter, join) and index encrypted data without extra privacy loss. Crucially, these estimates and indexes can be prepared before runtime, thereby facilitating efficient query planning and accurate cost estimations. Moreover, by using one-sided noise mechanisms and private upper bound techniques, SPECIAL ensures strict lossless processing for complex queries (e.g., multi-join). Through a comprehensive benchmark, we show that SPECIAL significantly outperforms cutting-edge SCAs, with up to 80X faster query times and over 900X smaller memory for complex queries. Moreover, it also achieves up to an 89X reduction in privacy loss under continual processing.
Cryptography and Security,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the main challenges faced by Secure Collaborative Analysis (SCA) in practical applications. Specifically, traditional SCA systems use data - independent primitives to ensure strong privacy, which leads to a huge performance overhead and limits their practical applications. Although some recent variants have improved the balance between privacy and efficiency by allowing controlled information leakage under Differential Privacy (DP), they still have some key problems: 1. **Unbounded Privacy Loss**: Most existing DP - SCA systems allocate privacy budgets per operation, which may lead to unbounded privacy loss or stop query responses when the budget is exhausted. 2. **Sub - optimal Execution Plan**: Existing SCA systems lack the capabilities of traditional query optimizers. They are unable to estimate and select the optimal query execution plan before running, resulting in the expansion of intermediate result sizes and seriously affecting performance. 3. **Lossy Processing**: Due to the noise introduced by the DP mechanism, existing systems cannot provide deterministic accuracy guarantees for complex queries, and stronger privacy settings will further increase the noise and affect the practicality of the system. To solve these problems, the paper proposes SPECIAL - the first secure collaborative analysis system that can simultaneously ensure bounded privacy loss, advanced query planning, and lossless processing. The core innovation of SPECIAL lies in its use of a new synopsis - assisted secure processing model. This model obtains private synopses (table statistics) at a one - time privacy cost and uses these synopses for compact estimation and indexing of encrypted data, thereby optimizing query processing without incurring additional privacy loss. ### Key Technical Contributions 1. **Selecting Appropriate Synopses**: SPECIAL proposes a focusing strategy that gives priority to generating synopses for attributes frequently involved in joins and filters in low - dimensional attributes (1D and 2D) to optimize the use of privacy budgets. 2. **Achieving Lossless Processing**: By using one - sided DP noise to generate synopses and designing new primitives to pessimistically estimate filter cardinalities and index structure intervals, it is ensured that no data is lost during processing. 3. **Supporting Efficient Query Processing**: It explores multiple uses of synopses in accelerating secure processing, such as creating private indexes (SPEidx) and designing compact blind operations (SPEop). In addition, a Selinger - style query planner has been developed, which combines a custom - made cost model and heuristic algorithms to generate the optimal query execution plan. 4. **System Evaluation**: To solve the problem of the lack of open - source benchmarks in SCA design evaluation, SPECIAL has built an open - source evaluation platform and conducted a comprehensive benchmark test using public financial data, demonstrating its significant performance advantages over existing systems. In summary, SPECIAL aims to solve the trade - off problem between privacy protection and performance optimization in existing SCA systems by introducing a synopsis - assisted secure processing model, thereby promoting the practical application and development of secure collaborative analysis.