Abstract:The problem of optimization in machine learning is well established but it entails several approximations. The theory of Hilbert spaces, which is principled and well established, helps solve the representation problem in machine learning by providing a rich (universal) class of functions where the optimization can be conducted. Working with functions is cumbersome, but for the class of reproducing kernel Hilbert spaces (RKHSs) it is still manageable provided the algorithm is restricted to inner products. The best example is the support vector machine (SVM), which is a batch mode algorithm that uses a very efficient (supralinear) optimization procedure. However, the problem of SVMs is that they display large memory and computational complexity. For the large-scale data limit, SVMs are restrictive because for fast operation the Gram matrix, which increases with the square of the number of samples, must fit in computer memory. The computation in this best-case scenario is also proportional to number of samples square. This is not specific to the SVM algorithm and is shared by kernel regression. There are also other relevant data processing scenarios such as streaming data (also called a time series) where the size of the data is unbounded and potentially nonstationary, therefore batch mode is not directly applicable and brings added difficulties. Online learning in kernel space is more efficient in many practical large scale data applications. As the training data are sequentially presented to the learning system, online kernel learning, in general, requires much less memory and computational bandwidth. The drawback is that online algorithms only converge weakly (in mean square) to the optimal solution, i.e., they only have guaranteed convergence within a ball of radius epsilon around the optimum (epsilon is controlled by the user). But because the theoretical optimal ML solution has many approximations, this is one more approximation that is worth exploring practically. The most important recent advance in this field is the development of the kernel adaptive filters (KAFs). The KAF algorithms are developed in reproducing kernel Hilbert space (RKHS), by using the linear structure of this space to implement well-established linear adaptive algorithms (e.g., LMS, RLS, APA, etc.) and to obtain nonlinear filters in the original input space. The main goal of this chapter is to bring closer to readers, from both machine learning and signal processing communities, these new online learning techniques. In this chapter, we focus mainly on the kernel least mean square (KLMS), kernel recursive least squares (KRLSs), and the kernel affine projection algorithms (KAPAs). The derivation of the algorithms and some key aspects, such as the mean-square convergence and the sparsification of the solutions, are discussed. Several illustration examples are also presented to demonstrate the learning performance.

Optimizing Kernel Machines using Deep Learning

Generalized Convexity-Based Inexact Projection Method for Multiple Kernel Learning

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

Learning Explicit Deep Representations from Deep Kernel Networks

Kernel-Optimized Based Machine for Image Recognition

A Deep Learning Approach To Multiple Kernel Fusion

Efficient kernel surrogates for neural network-based regression

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

Learning with the Optimized Data-Dependent Kernel

Enhancing deep neural networks via multiple kernel learning

A Unified Framework for Kernelization: the Empirical Kernel Feature Space

Toward Large Kernel Models

Universality and Optimality of Structured Deep Kernel Networks

Theoretical Methods In Machine Learning

Optimizing the Data-Dependent Kernel under a Unified Kernel Optimization Framework

Deep Kernel Methods Learn Better: From Cards to Process Optimization

A resource-efficient model for deep kernel learning

Bridging deep and multiple kernel learning: A review

DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks

Deep Clustered Convolutional Kernels

Efficient Second-Order Optimization for Neural Networks with Kernel Machines.