Local Relevance Weighted Maximum Margin Criterion for Text Classification.

Quanquan Gu,Jie Zhou
DOI: https://doi.org/10.1137/1.9781611972795.97
2009-01-01
Abstract:Text classiflcation is a very important task in information retrieval and data mining. In vector space model (VSM), document is represented as a high dimensional vector, and a feature extraction phase is usually needed to reduce the dimensionality of the document. In this paper, we pro- pose a feature extraction method, named Local Relevance Weighted Maximum Margin Criterion (LRWMMC). It aims to learn a subspace in which the documents in the same class are as near as possible while the documents in the difierent classes are as far as possible in the local region of each docu- ment. Furthermore, the relevance is taken into account as a weight to determine the extent to which the documents will be projected. LRWMMC is able to flnd the low dimensional manifold embedded in the high dimensional ambient space. In addition, We generalize LRWMMC to Reproducing Ker- nel Hilbert Space (RKHS), which can resolve the nonlinear- ity of the input space. We also generalize LRWMMC to tensor space which is suitable for a new document repre- sentation, named tensor space model (TSM). On the other hand, in order to utilize the large amount of unlabeled docu- ments, we also present a Semi-Supervised LRWMMC, which aims to flnd a projection inferred from the labeled samples, as well as the unlabeled samples. Finally, we present a fast algorithm based on QR-decomposition to make the meth- ods proposed in this paper apply for large scale data set. Encouraging experimental results on benchmark text classi- flcation data sets indicate that the proposed methods out- perform many existing feature extraction methods for text classiflcation.
What problem does this paper attempt to address?