PSpec-SQL: Enabling Fine-Grained Control for Distributed Data Analytics.

Chen Luo,Fei He,Fei Peng,Dong Yan,Dan Zhang,Xin Zhou
DOI: https://doi.org/10.1109/tdsc.2019.2914209
2021-01-01
IEEE Transactions on Dependable and Secure Computing
Abstract:Business organizations regularly collect customer data to improve their services. Organizations may want to share data within themselves or even with third-parties to maximize data utility. Since business data contain lots of customer data, organizations must respect customers' privacy expounded by privacy laws. In this paper, we present PSpec-SQL, a distributed data analytics system that automatically enforces privacy compliance for SQL queries. Our system provides a high-level language PSpec for the data owner to specify her data usage policy. As usual, the data analyst queries data to perform data analysis, but our system checks each query to ensure only policy-compliant queries are executed. We have implemented a prototype of PSpec-SQL on top of Spark-SQL, and carried out a case study on the TPC benchmarks. The results show the practicability of our system with negligible overhead over query processing.
What problem does this paper attempt to address?