Industrial Federated Topic Modeling

Di Jiang,Yongxin Tong,Yuanfeng Song,Xueyang Wu,Weiwei Zhao,Jinhua Peng,Rongzhong Lian,Qian Xu,Qiang Yang
DOI: https://doi.org/10.1145/3418283
IF: 5
2021-02-01
ACM Transactions on Intelligent Systems and Technology
Abstract:Probabilistic topic modeling has been applied in a variety of industrial applications. Training a high-quality model usually requires a massive amount of data to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence, training topic models in industrial scenarios using conventional approaches faces a dilemma: A party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this article, we propose a framework named Industrial Federated Topic Modeling (iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries. iFTM is inspired by federated learning, supports two representative topic models (i.e., Latent Dirichlet Allocation and SentenceLDA) in industrial applications, and consists of novel techniques such as private Metropolis-Hastings, topic-wise normalization, and heterogeneous model integration. We conduct quantitative evaluations to verify the effectiveness of iFTM and deploy iFTM in two real-life applications to demonstrate its utility. Experimental results verify iFTM’s superiority over conventional topic modeling.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?