Non-Negative Matrix Factorization For Filtering Chinese And Oriental Language Document

Bw Xu,Jx Jiang,Jj Lu,P Wang
2004-01-01
Abstract:As the numbers of document in Chinese and oriental languages increased in recent years, it becomes increasingly important to develop oriental-language-document filtering systems. In oriental language documents, the classical problems of synonymy and polysemy still exists, so the filtering method based on the latent semantic indexing (LSI), which represent documents by semantic relations between words, perform better than other methods which represent documents just by words. Non-negative matrix factorization (NMF), another method for dimensionality reduction and distinguished from LSI by its non-negativity constraints, has supervised LSI in many other fields, such as English-document clustering and classifying etc. In this paper, we propose a new method based on NMF to obtain topic profiles from the set of sample documents, and use them for document filtering. The experimental results show that the new method is better than a highly effective method based on LSI in filtering the oriental language documents.
What problem does this paper attempt to address?