Web Search with Text Categorization Using Probabilistic Framework of SVM

B. P. C. Lim,M. H. Tsui,V. Charastrakul,D. Shi
DOI: https://doi.org/10.1109/icsmc.2006.384566
2006-01-01
Abstract:The role of text categorization algorithms is to deal with the ever increasing amount of documents either online or offline. Its capability to organize numerous documents into pre-defined categories significantly increases the efficiency and decreases human resources. Recently, support vector machine (SVM) gained popularity due to its excellent generalization ability and fast training speed on large dataset. However, the performance of SVM heavily relies on the penalty coefficient parameter and kernel parameters. In this paper, we implement a probabilistic framework for support vector machine (PSVM) that allows for automatic tuning of the penalty coefficient parameters and the kernel parameters via Markov chain Monte Carlo (MCMC) method and apply it to Web searching via text categorization. This probabilistic framework was tested on well known benchmark text categorization dataset. The result from PSVM was compared against the conventional SVM, and K-nearest neighbor with P-tree (KNN-Ptree) and KNN. The proposed methodology is applied to develop a Web search engine.
What problem does this paper attempt to address?