Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Peter Wassenaar,Pierre Guetschel,Michael Tangermann
2024-04-05
Abstract:In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.
Machine Learning,Artificial Intelligence,Human-Computer Interaction,Signal Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: real - time visualization and interpretation of high - dimensional and noisy brain signal data streams in the field of brain - computer interface (BCI). Specifically, existing projection algorithms such as standard UMAP (Uniform Manifold Approximation and Projection) can generate high - quality low - dimensional representations, but their computational cost is high and it is difficult to achieve real - time processing. Therefore, this paper proposes a new UMAP variant - approximate UMAP (aUMAP), aiming to achieve fast real - time visualization of high - dimensional data streams by reducing projection time while maintaining training time and projection quality comparable to standard UMAP. ### Background and Problem Description of the Paper 1. **Characteristics of Brain Signal Data** - High - dimensional: Brain signal data usually has a high dimension. - Noisy: Brain signal data is easily affected by noise, increasing the difficulty of analysis. 2. **Limitations of Existing Methods** - **Standard UMAP**: Although it can generate high - quality low - dimensional representations, it has a high computational cost and is not suitable for real - time processing. - **PCA**: It has a fast calculation speed, but it cannot handle data with complex nonlinear structures. - **ISOMAP**: It has a good effect on processing noisy data, but its computational complexity is high and it is not suitable for large - scale data sets. - **parametric UMAP (pUMAP)**: It accelerates projection through neural networks, but the model is heavy and may require specific hardware support. 3. **Research Objectives** - Propose a new UMAP variant - aUMAP, which can significantly reduce projection time while ensuring projection quality. - Evaluate the performance of aUMAP on different data sets and verify whether it is suitable for real - time online projection. ### Method Overview 1. **Working Principle of aUMAP** - **Model Training**: The training process of aUMAP is the same as that of standard UMAP, fitting data by optimizing a well - defined objective function. - **Projection of New Data Points**: aUMAP approximates the projection of new data points through the k - NN (k - Nearest Neighbor) method instead of recalculating the entire projection space. The specific formula is as follows: \[ u=\frac{\sum_{i = 1}^{k}\frac{1}{d_{i}}u_{i}}{\sum_{j = 1}^{k}\frac{1}{d_{j}}} \] where \(u\) is the projection of the new data point \(x\), \(k\) is the number of neighbors considered, \(u_{1},u_{2},\ldots,u_{k}\) are the UMAP projections of the \(k\) nearest neighbor points of \(x\) in the input space, and \(d_{i}=\text{distance}(x,x_{i})\) is the distance between \(x\) and its \(i\)-th nearest neighbor point. 2. **Experimental Setup** - **Data Sets**: Three standard data sets (Iris plants, handwritten digits, Wisconsin breast cancer) were used to evaluate the performance of aUMAP. - **Benchmark Tests**: The performance of aUMAP was compared with that of standard UMAP and pUMAP in terms of training time and projection time. - **Hardware Configuration**: The experiment was carried out on an AMD Ryzen 7 5800x 8 - core processor and an NVIDIA GeForce RTX 3060 Ti, using Windows Subsystem for Linux (WSL) v.2.0.9.0 to support TensorFlow GPU. ### Experimental Results 1. **Accuracy of aUMAP** - The projections generated by aUMAP are very close to those of standard UMAP, and the average Euclidean distance is between 0.1 and 0.25 standard deviations. - Although aUMAP sometimes generates some outliers, it can still maintain a clustering effect similar to that of standard UMAP on the whole. 2. **Training Time** - aUMAP and...