Abstract:The \emph{Fast Gaussian Transform} (FGT) enables subquadratic-time multiplication of an $n\times n$ Gaussian kernel matrix $\mathsf{K}_{i,j}= \exp ( - \| x_i - x_j \|_2^2 ) $ with an arbitrary vector $h \in \mathbb{R}^n$, where $x_1,\dots, x_n \in \mathbb{R}^d$ are a set of \emph{fixed} source points. This kernel plays a central role in machine learning and random feature maps. Nevertheless, in most modern data analysis applications, datasets are dynamically changing (yet often have low rank), and recomputing the FGT from scratch in (kernel-based) algorithms incurs a major computational overhead ($\gtrsim n$ time for a single source update $\in \mathbb{R}^d$). These applications motivate a \emph{dynamic FGT} algorithm, which maintains a dynamic set of sources under \emph{kernel-density estimation} (KDE) queries in \emph{sublinear time} while retaining Mat-Vec multiplication accuracy and speed.
Assuming the dynamic data-points $x_i$ lie in a (possibly changing) $k$-dimensional subspace ($k\leq d$), our main result is an efficient dynamic FGT algorithm, supporting the following operations in $\log^{O(k)}(n/\varepsilon)$ time: (1) Adding or deleting a source point, and (2) Estimating the ``kernel-density'' of a query point with respect to sources with $\varepsilon$ additive accuracy. The core of the algorithm is a dynamic data structure for maintaining the \emph{projected} ``interaction rank'' between source and target boxes, decoupled into finite truncation of Taylor and Hermite expansions.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the problem of efficiently performing Fast Gaussian Transform (FGT) on dynamic datasets. Specifically, existing FGT algorithms are static, assuming that the set of data points is fixed. However, in many modern data analysis applications, datasets are often dynamic (e.g., source points may be added or removed), which necessitates recalculating the FGT each time the dataset is updated, resulting in significant computational overhead.
To tackle this challenge, this paper proposes a dynamic FGT algorithm that supports the following operations in sublinear time:
1. **Adding or removing source points**: Completed in logarithmic polynomial time.
2. **Estimating kernel density at query points**: Completed in logarithmic polynomial time, with guaranteed estimation error within a given precision range.
Additionally, the algorithm can handle changes in dynamic data points while maintaining the accuracy and speed of matrix-vector multiplication. Specifically, when data points lie in a low-dimensional subspace, the efficiency of the algorithm can be further improved.
### Key Contributions
- **Dynamic Data Structure**: Designed a dynamic data structure that can efficiently maintain the set of source points and support fast kernel density estimation and matrix-vector multiplication queries.
- **Low-Dimensional Subspace Optimization**: When data points lie in a low-dimensional subspace, the efficiency of the algorithm can be significantly improved, avoiding the exponential complexity brought by high-dimensional data.
- **Theoretical Guarantees**: Provided rigorous theoretical analysis proving the effectiveness and accuracy of the algorithm in dynamic settings.
### Application Background
- **Machine Learning**: In many machine learning tasks, such as Kernel PCA, Ridge Regression, Gaussian Process Regression, etc., matrix-vector multiplication operations are frequently required. The dynamic FGT algorithm can significantly improve the computational efficiency of these tasks.
- **Online Learning**: In online learning scenarios, datasets are continuously changing. The dynamic FGT algorithm can update the model in real-time, maintaining the accuracy of predictions and the model.
### Future Directions
- **Further Optimization of Time Complexity**: The current algorithm has achieved quasi-linear time complexity in static settings, but whether it can be further optimized to linear time complexity in certain specific cases remains an open question.
- **Handling Slowly Decaying Kernel Functions**: The current FMM technology struggles to achieve high precision when dealing with slowly decaying kernel functions. Future research may need to develop new techniques to address this issue.
In summary, this paper proposes a dynamic FGT algorithm to solve the problem of efficiently performing Fast Gaussian Transform on dynamic datasets, providing important technical support for practical applications in machine learning and data analysis.