Learning Kernels via Margin-and-Radius Ratios
Kun Gai,Guangyun Chen,Changshui Zhang
2010-01-01
Abstract:Despite the great success of SVM, it is usually difficult for users to select suitable kernels for SVM classifiers. Kernel learning has been developed to jointly learn both a kernel and an SVM classifier [1]. Most existing kernel learning approaches, eg,[2, 3, 4], employ the margin based formulation, equivalent to: mink, w, b, ξi 1 2 w2+ C∑ i ξi, st yi〈 φ (xi; k), w〉+ b+ ξi≥ 1, ξi≥ 0,(1) where k is the learned kernel which implicitly defines a transformation φ (·; k) to a feature space by k (xc, xd)=〈 φ (xc; k), φ (xd; k)〉,(w, b) is an SVM classifier, and xi, yi and ξi are input instances, labels and hinge losses. To make the problem trackable, the learned kernel is usually restricted to a parametric form k (θ)(·,·), where θ=[θi] i is the kernel parameter. The most common used form is a linear combination of multiple basis kernels, as k (θ)(·,·)=∑ m j= 1 θjkj (·,·), θj≥ 0.(2)Let γ denote the margin of the SVM classifier with k. It is well known that γ− 2= w2. Thus the term w2 makes the margin based formulation (1) prefer the kernel that results in an SVM classifier with a larger margin. However, the margin itself can not well describe the goodness of a kernel. Any kernel, even one with a bad performance, can have arbitrarily large margin by enlarging the kernel’s scaling, and may be selected to be the final solution [5]. Therefore the margin based kernel learning methods suffer from scaling problems. In linear combination cases, a remedy is to enforce a norm constraint on kernel parameters. Unfortunately, it is difficult to select suitable types of norm constraints, and with norm constraints the scaling problem also causes another initialization problem: different initial scalings of basis …