Multidimensional Latent Semantic Analysis Using Term Spatial Information

Haijun Zhang,John K. L. Ho,Q. M. Jonathan Wu,Yunming Ye
DOI: https://doi.org/10.1109/tsmcc.2012.2227112
IF: 11.8
2013-01-01
IEEE Transactions on Cybernetics
Abstract:In this paper, we consider the problem of in-depth document analysis. In particular, we propose a novel document analysis method, named multidimensional latent semantic analysis (MDLSA), which enables us to mine local information efficiently from a document with respect to term associations and spatial distributions. MDLSA works by first partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term cooccurrence in a paragraph. We then conduct a 2-D principal component analysis to achieve an optimal semantic mapping. This analysis involves finding the leading eigenvectors of the sample covariance matrix of a training set to characterize the lower dimensional semantic space. A hybrid document similarity measure is designed to further improve the performance of this framework. Our algorithm is examined in two document applications: retrieval and classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
What problem does this paper attempt to address?