Text Metric Method on Statistical Manifold Learning

Zheng-yu LI,Huan-huan CHEN
DOI: https://doi.org/10.3969/j.issn.1000-1220.2018.03.021
2018-01-01
Abstract:Traditional methods for text classification,including kernel methods,TF-IDF,etc.ignore the semantic information and the diversity of topic distribution on words and texts.In this paper,a text metric method is proposed,which is based on the assumption of Gaussian distribution topic model and statistical manifold learning framework.The algorithm is called text metric on statistical mani-fold (TMSM).TMSM is an extension of topic model,by utilizing a Gaussian mixture model to describe the distribution of all words, a probabilistic text representation model based on different distributions of topics can be obtained.Then the distance of texts can be cal-culated by statistical manifold learning.The experimental results on text classification tasks demonstrate TMSM outperforms all other methods on all datasets.
What problem does this paper attempt to address?