A Linear Text Classification Algorithm Based on Category Relevance Factors

Zhi-Hong Deng,Shi-Wei Tang,Dong-Qing Yang,Ming Zhang,Xiao-Bin Wu,Meng Yang
DOI: https://doi.org/10.1007/3-540-36227-4_9
2002-01-01
Abstract:In this paper, we present a linear text classification algorithm called CRF. By using category relevance factors, CRF computes the feature vectors of training documents belonging to the same category. Based on these feature vectors, CRF induces the profile vector of each category. For new unlabelled documents, CRF adopts a modified cosine measure to obtain similarities between these documents and categories and assigns them to categories that have the biggest similarity scores. In CRF, it is profile vectors not vectors of all training documents that join in computing the similarities between documents and categories. We evaluated our algorithm on a subset of Reuters-21578 and 20_newsgroups text collections and compared it against k-NN and SVM. Experimental results show that CRF outperforms k-NN and is competitive with SVM.
What problem does this paper attempt to address?