Text Similarity Computing Based on Topic Model LDA

Zhen-zhen WANG,Ming HE,Yong-ping DU
DOI: https://doi.org/10.3969/j.issn.1002-137X.2013.12.049
2013-01-01
Computer Science
Abstract:Latent Dirichlet Allocation(LDA)is an unsupervised model which exhibits superiority on latent topic modeling of text data in the research of recent years.This paper presented a method which improves text similarity calculation by using LDA model.This method models corpus and text with LDA.Parameters are estimated with Gibbs sampling of MCMC and the word probability is represented.It can mine the hidden relationship between the different topics and the words from texts,get the topic distribution,and compute the similarity between the text.Finally,the text similarity matrix clustering experiments are carrieel out to assess the effect of clustering.Experimental results show that the method can improve the text similarity accurate rate and clustering quality effectively.
What problem does this paper attempt to address?