Abstract:Latent Dirichlet allocation (LDA) is a widely used fundamental tool for text analysis. Collapsed Gibbs sampling (CGS), as a widely adopted algorithm for learning the parameters of LDA, has the risk of privacy leakage. In this paper, we study the inherent privacy of CGS which is exploited to preserve the privacy for latent topic updates. We propose a method, called group subsampling, and a novel centralized privacy-preserving algorithm, called Fast-Differentially-Private LDA (FDP-LDA) to amplify the inherent privacy and improve the efficiency of traditional differentially private CGS. Theoretically, the general upper bound of the amplified inherent privacy loss in each iteration of FDP-LDA is verified mathematically. To our best knowledge, this is the first work that analyzes the inherent privacy amplification of differentially private CGS. Experimentally, results on real-world datasets validate the improved performances of FDP-LDA.

FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling Via Group Subsampling.