Abstract:One of the limiting factors of using support vector machines (SVMs) in large scale applications are their super-linear computational requirements in terms of the number of training samples. To address this issue, several approaches that train SVMs on many small chunks of large data sets separately have been proposed in the literature. So far, however, almost all these approaches have only been empirically investigated. In addition, their motivation was always based on computational requirements. In this work, we consider a localized SVM approach based upon a partition of the input space. For this local SVM, we derive a general oracle inequality. Then we apply this oracle inequality to least squares regression using Gaussian kernels and deduce local learning rates that are essentially minimax optimal under some standard smoothness assumptions on the regression function. This gives the first motivation for using local SVMs that is not based on computational requirements but on theoretical predictions on the generalization performance. We further introduce a data-dependent parameter selection method for our local SVM approach and show that this method achieves the same learning rates as before. Finally, we present some larger scale experiments for our localized SVM showing that it achieves essentially the same test performance as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are significantly better.

Adaptive Learning Rates for Support Vector Machines Working on Data with Low Intrinsic Dimension

An Incremental Updating Method for Support Vector Machines

Incremental batch learning with support vector machines

Learning Rates for Classification with Gaussian Kernels

Adaptive Bayesian Regression on Data with Low Intrinsic Dimensionality

Kernel regression, minimax rates and effective dimensionality: beyond the regular case

A Large Dimensional Analysis of Least Squares Support Vector Machines

Optimal learning rates for Kernel Conjugate Gradient regression

Optimal Rate of Kernel Regression in Large Dimensions

Optimal Learning Rates for Localized SVMs

Dimension-independent learning rates for high-dimensional classification problems

Adaptive Inference in Multivariate Nonparametric Regression Models Under Monotonicity

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Optimal Learning Rates for Distribution Regression

Learning Rates for Kernel-Based Expectile Regression

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

The statistical rate for support matrix machines under low rankness and row (column) sparsity

Robust Regularized Low-Rank Matrix Models for Regression and Classification

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

A Support Vector Machine with a Hybrid Kernel and Minimal Vapnik-Chervonenkis Dimension