Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Wenbin Zheng,Yuntao Qian,Hong Tang
DOI: https://doi.org/10.1007/978-3-642-23896-3_62
2011-01-01
Abstract:Dimensionality reduction can efficiently improve computing performance of classifiers in text categorization, and non-negative matrix factorization could map the high dimensional term space into a low dimensional semantic subspace easily. Meanwhile, the non-negative of the basis vectors could provide a meaningful explanation for the semantic subspace. However, it usually could not achieve a satisfied classification performance because it is sensitive to the noise, data missing and outlier as a linear reconstruction method. This paper proposes a novel approach in which the train text and its category information are fused and a transformation matrix that maps the term space into a semantic subspace is obtained by a basis orthogonality non-negative matrix factorization and truncation. Finally, the dimensionality can be reduced aggressively with these transformations. Experimental results show that the proposed approach remains a good classification performance in a very low dimensional case.
What problem does this paper attempt to address?