Authorship Attribution for Short Texts with Author-Document Topic Model.

Haowen Zhang,Peng Nie,Yanlong Wen,Xiaojie Yuan
DOI: https://doi.org/10.1007/978-3-319-99365-2_3
2018-01-01
Abstract:The goal of authorship attribution is to assign the controversial texts to the known authors correctly. With the development of social media services, authorship attribution for short texts becomes very necessary. In the earlier works, topic models, such as the Latent Dirichlet Allocation (LDA), have been used to find latent semantic features of authors and achieve better performance on authorship attribution. However, most of them focus on authorship attribution for long texts. In this paper, we propose a novel model named Author-Document Topic Model (ADT) which builds the model for the corpus both at the author level and the document level to figure out the problem of authorship attribution for short texts. Also, we propose a new classification algorithm to calculate the similarity between texts for finding the authors of the anonymous texts. Experimental results on two public datasets validate the effectiveness of our proposed method.
What problem does this paper attempt to address?