Graph Regularized Non-negative Matrix Factorization with Long-tail Constraint

Lu You,Rui Liu,He Zhang,Z. M. Shan
DOI: https://doi.org/10.1109/pacrim47961.2019.8985119
2019-01-01
Abstract:How to dig out long tail topics is a great challenge in text mining. In previous research, most of non-hierarchical topic models were based on a hypothesis that the topics in documents follow polynomial distribution, ignoring the topics at the tail of distribution curve. Hierarchical topic model have the ability to mine long tail topics by introducing the hierarchical relationship among topics, but leading to a higher computational complexity. In this article, we propose a new method to mine long tail topics, which is called graph regularized non-negative matrix factorization with long-tail constraint. It uses KL divergence to measure the difference between matrices, and use neighbor graph to preserve the intrinsic geometrical and discriminating structure between original samples in low-dimensional space. Experiment shows, the algorithm we proposed can mine more long tail topic information in document, and make improvement in the task of data mining, comparing to other method, such as classical dirichlet distribution, non-negative matrix, hierarchical matrix, hierarchical latent dirichlet distribution.
What problem does this paper attempt to address?