Estimating the Generalization Performance of Polynomial SVM Classifier for Text Categorization

孙建涛,郭崇慧,陆玉昌,石纯一
2004-01-01
Abstract:VC theory and structural risk minimization principle are key concepts of statistical learning theory. Developed from this theory, SVM is widely investigated and used for text categorization because of its high generalization performance. Previous work showed that polynomial SVM's performance was irrevelant of the order and it was appropriate for high dimensional text categorization problems without feature selection. The research indicates over-fitting problems occur as the polynomial order increases. SVM's generalization performance decreases drastically if too many features are used, so feature selection is necessary. Based on the structural risk minimization principle, this fact is analyzed via estimating functional classes's VC dimension. And the empirical results support the theoretical conclusions.
What problem does this paper attempt to address?