Clust-LDA: Joint Model for Text Mining and Author Group Inference

Shaoyang Ning,Xi Qu,Victor Cai,Nathan Sanders
DOI: https://doi.org/10.48550/arXiv.1810.02717
2018-10-05
Information Retrieval
Abstract:Social media corpora pose unique challenges and opportunities, including typically short document lengths and rich meta-data such as author characteristics and relationships. This creates great potential for systematic analysis of the enormous body of the users and thus provides implications for industrial strategies such as targeted marketing. Here we propose a novel and statistically principled method, clust-LDA, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors. We develop an inference procedure for clust-LDA and demonstrate its performance on simulated data, showing that clust-LDA out-performs the "vanilla" LDA on the topic identification task where authors exhibit distinctive topical preference. We also showcase the empirical performance of clust-LDA based on a real-world social media dataset from Reddit.
What problem does this paper attempt to address?