Topic Mover's Distance Based Document Classification

Xinhui Wu,Hui Li
DOI: https://doi.org/10.1109/icct.2017.8359979
2017-01-01
Abstract:We propose the Topic Mover's Distance (TMD), a new topic-based distance metric for documents, which is inspired from recently proposed Word Mover's Distance (WMD). Similar to WMD, TMD metric measures the similarity between two documents as the minimum amount of distance that the topics in one document need to travel to the topics in the other document. In our scheme, topics are the basic units to modeling documents, which are clustered from a general word-word co-occurrence matrix by Poisson Infinite Relational Model (PIRM) and vectorized by Glove embedding algorithm. Experiments for document classification on six real world datasets show that compared with word-based WMD, the proposed TMD can achieve much lower time complexity with the same accuracy.
What problem does this paper attempt to address?