Multi-Party Sequential Data Publishing Under Differential Privacy

Peng Tang,Rui Chen,Sen Su,Shanqing Guo,Lei Ju,Gaoyuan Liu
DOI: https://doi.org/10.1109/tkde.2023.3241661
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Given a set of local sequential datasets held by multiple parties, we study the problem of publishing a synthetic dataset that preserves approximate sequentiality information of the integrated dataset while satisfying differential privacy for each local dataset. The existing solutions for publishing differentially private sequential data in the centralized setting mostly adopt tree-based approaches. Such approaches rely on different tree structures that encode sequential data's statistical information. The construction of a tree structure is normally done by recursively splitting nodes whose noisy scores (e.g., entropy or count) are larger than a given threshold. However, extending similar ideas to the multi-party setting is challenging. First, the comparison between noisy scores and a given threshold needs to be done in a distributed manner without letting the parties know the noisy scores, while satisfying differential privacy for each local dataset. Second, in the multi-party setting the large number of node splitting decisions incurs prohibitive computation costs. In addressing the above challenges, we present DPST, a distributed prediction suffix tree construction solution. In DPST, we first introduce a novel node splitting decision method that calculates the comparison result under encryption with substantially improved efficiency. Then we present a novel batch-based tree construction approach to reduce computation costs. In order to achieve high parallel performance without incurring any extra communication cost, we introduce the conjunction and slide methods to ensure that each batch contains a stable number of carefully arranged decision tasks. To further reduce communication and computation costs, we propose a prefix-based pre-pruning method to reduce the number of nodes that need to be judged whether to split by an interactive protocol. Extensive experiments on real datasets demonstrate that our DPST solution offers desirable data utility with low computation and communication costs.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?