Web Document Categorization by Support Vector Clustering.

Daming Shi,Ming Hei Tsui,Jigang Liu
DOI: https://doi.org/10.1109/icsmc.2008.4811495
2008-01-01
Abstract:Search Engine has proven its effectiveness for retrieval of information from World Wide Web. Traditionally, the search results are arranged in an ordered list by popularity and relevancy. However, the enormous size of matched Web pages causes inefficiency for users to locate the most relevant Web pages. A proper organization of the search result is important to improve its browsability of Web searching. In this paper, we proposed by performing Support Vector Clustering (SVC) on the search result to reorganize results in groups of similar context to facilitate effective browsing of search result by the users. SVC is a nonparametric clustering algorithm that can group clusters with arbitrary shapes and without the need to specify the number of clusters. It is a kernel clustering method that maps via a nonlinear function to a high dimension feature space. To obtain the optimal clustering result, choosing of the accurate parameters (kernel width and penalty coefficient) for SVC is crucial. In this paper, it proposed an automatic tuning method for SVC parameters to obtain the optimal result. The results from the experiment have proven the effectiveness and usefulness of above mentioned method. The performance is comparable to other popular clustering techniques.
What problem does this paper attempt to address?