Theoretical Methods In Machine Learning
Badong Chen,Weifeng Liu,Jose C. Principe
DOI: https://doi.org/10.1007/978-3-662-43505-2_30
2015-01-01
Abstract:The problem of optimization in machine learning is well established but it entails several approximations. The theory of Hilbert spaces, which is principled and well established, helps solve the representation problem in machine learning by providing a rich (universal) class of functions where the optimization can be conducted. Working with functions is cumbersome, but for the class of reproducing kernel Hilbert spaces (RKHSs) it is still manageable provided the algorithm is restricted to inner products. The best example is the support vector machine (SVM), which is a batch mode algorithm that uses a very efficient (supralinear) optimization procedure. However, the problem of SVMs is that they display large memory and computational complexity. For the large-scale data limit, SVMs are restrictive because for fast operation the Gram matrix, which increases with the square of the number of samples, must fit in computer memory. The computation in this best-case scenario is also proportional to number of samples square. This is not specific to the SVM algorithm and is shared by kernel regression. There are also other relevant data processing scenarios such as streaming data (also called a time series) where the size of the data is unbounded and potentially nonstationary, therefore batch mode is not directly applicable and brings added difficulties. Online learning in kernel space is more efficient in many practical large scale data applications. As the training data are sequentially presented to the learning system, online kernel learning, in general, requires much less memory and computational bandwidth. The drawback is that online algorithms only converge weakly (in mean square) to the optimal solution, i.e., they only have guaranteed convergence within a ball of radius epsilon around the optimum (epsilon is controlled by the user). But because the theoretical optimal ML solution has many approximations, this is one more approximation that is worth exploring practically. The most important recent advance in this field is the development of the kernel adaptive filters (KAFs). The KAF algorithms are developed in reproducing kernel Hilbert space (RKHS), by using the linear structure of this space to implement well-established linear adaptive algorithms (e.g., LMS, RLS, APA, etc.) and to obtain nonlinear filters in the original input space. The main goal of this chapter is to bring closer to readers, from both machine learning and signal processing communities, these new online learning techniques. In this chapter, we focus mainly on the kernel least mean square (KLMS), kernel recursive least squares (KRLSs), and the kernel affine projection algorithms (KAPAs). The derivation of the algorithms and some key aspects, such as the mean-square convergence and the sparsification of the solutions, are discussed. Several illustration examples are also presented to demonstrate the learning performance.