Maximilian Lucassen,Johan A.K. Suykens,Kim Batselier
Abstract:Least squares support vector machines are a commonly used supervised learning method for nonlinear regression and classification. They can be implemented in either their primal or dual form. The latter requires solving a linear system, which can be advantageous as an explicit mapping of the data to a possibly infinite-dimensional feature space is avoided. However, for large-scale applications, current low-rank approximation methods can perform inadequately. For example, current methods are probabilistic due to their sampling procedures, and/or suffer from a poor trade-off between the ranks and approximation power. In this paper, a recursive Bayesian filtering framework based on tensor networks and the Kalman filter is presented to alleviate the demanding memory and computational complexities associated with solving large-scale dual problems. The proposed method is iterative, does not require explicit storage of the kernel matrix, and allows the formulation of early stopping conditions. Additionally, the framework yields confidence estimates of obtained models, unlike alternative methods. The performance is tested on two regression and three classification experiments, and compared to the Nyström and fixed size LS-SVM methods. Results show that our method can achieve high performance and is particularly useful when alternative methods are computationally infeasible due to a slowly decaying kernel matrix spectrum.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the memory and computational complexity issues encountered by large - scale least - squares support vector machines (LS - SVM) when dealing with large - scale data sets. Specifically, traditional low - rank approximation methods perform poorly when dealing with data with slowly decaying kernel matrix spectra, which limits their effectiveness in large - scale applications. To solve these problems, the authors propose a recursive Bayesian framework based on tensor networks and Kalman filtering, aiming to reduce the high memory and computational requirements associated with solving large - scale dual problems by avoiding explicit storage of the kernel matrix and providing prediction confidence intervals.
### Problems Solved by the Paper:
1. **Large - scale Data Processing**: Traditional methods face limitations in memory and computational resources when dealing with large - scale data sets. In particular, as the number of data points \( N \) increases, the computational complexity \( O(N^3) \) and storage complexity \( O(N^2) \) make direct methods infeasible.
2. **Limitations of Low - rank Approximation Methods**: Existing low - rank approximation methods (such as the Nyström method and the fixed - size LS - SVM method) rely on the fast - decaying spectra of the kernel matrix, but perform poorly when the kernel matrix spectra decay slowly. These methods usually require a large number of sample subsets and cannot provide prediction confidence intervals.
3. **Computational Efficiency and Confidence Intervals**: The method proposed in the paper can not only efficiently handle large - scale data sets, but also provide prediction confidence intervals, which are not available in existing low - rank approximation methods.
### Solutions:
- **Tensor Network Representation**: Transform the dual problem into the tensor network (Tensor Train, TT) form, reducing the storage complexity from \( O(N^d) \) to \( O(dN r^2) \), where \( r \) is the TT rank.
- **Recursive Bayesian Filtering**: Use tensor network Kalman filtering (Tensor Network Kalman Filter, TNKF) for recursive solution, avoiding explicit construction and storage of large matrices.
- **Early Stopping Conditions**: Design early stopping conditions by calculating the norm value of the covariance matrix, thereby improving the efficiency and robustness of the algorithm.
### Main Contributions:
1. **Overcoming the Curse of Dimensionality**: Solve the curse of dimensionality problem of large - scale LS - SVM dual problems through low - rank TT representation.
2. **Non - sampling Algorithm**: Propose an algorithm that does not rely on sampling, evaluate the entire dual matrix and construct a low - rank tensor network approximation for each row.
3. **Confidence Interval Calculation**: Implement a recursive Bayesian filter that can calculate prediction confidence intervals, which are not available in existing low - rank approximation methods.
### Experimental Verification:
The paper conducted experiments on two regression tasks and three classification tasks and compared with the Nyström method and the fixed - size LS - SVM method. The results show that the proposed method performs well when dealing with large - scale data sets, especially when the kernel matrix spectra decay slowly. For example, when dealing with the two - spiral problem, 100% verification accuracy was achieved using 220 training points.
In summary, by introducing tensor network and Kalman filtering techniques, this paper effectively solves the memory and computational complexity problems of large - scale LS - SVM when dealing with large - scale data sets, and provides prediction confidence intervals, improving the reliability and practicality of the model.