Abstract:Visualization of high-dimensional data is a fundamental yet challenging problem in data mining. These visualization techniques are commonly used to reveal the patterns in the high-dimensional data, such as clusters and the similarity among clusters. Recently, some successful visualization tools (e.g., BH-t-SNE and LargeVis) have been developed. However, there are two limitations with them : (1) they cannot capture the global data structure well. Thus, their visualization results are sensitive to initialization, which may cause confusions to the data analysis. (2) They cannot scale to large-scale datasets. They are not suitable to be implemented on the GPU platform because their complex algorithm logic, high memory cost, and random memory access mode will lead to low hardware utilization. To address the aforementioned problems, we propose a novel visualization approach named as Anchor-t-SNE (AtSNE), which provides efficient GPU-based visualization solution for large-scale and high-dimensional data. Specifically, we generate a number of anchor points from the original data and regard them as the skeleton of the layout, which holds the global structure information. We propose a hierarchical optimization approach to optimize the positions of the anchor points and ordinary data points in the layout simultaneously. Our approach presents much better and robust visual effects on 11 public datasets, and achieve 5 to 28 times speed-up on different datasets, compared with the current state-of-the-art methods. In particular, we deliver a high-quality 2-D layout for a 20 million and 96-dimension dataset within 5 hours, while the current methods fail to give results due to running out of the memory.

Visualizing Large-Scale and High-Dimensional Data

Visualizing Large-Scale High-Dimensional Data Via Hierarchical Embedding of KNN Graphs

Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis.

Parallel Visualization for Large-Scale Datasets

Deep Manifold Computing and Visualization

Deep Manifold Computing and Visualization Using Elastic Locally Isometric Smoothness

Structure-preserving visualization for single-cell RNA-Seq profiles using deep manifold transformation with batch-correction

Efficiently Visualizing Large Graphs

AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization

Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

Scalable multivariate volume visualization and analysis

Scalable Multivariate Volume Visualization and Analysis based on Dimension Projection and Parallel Coordinates.

Laplacian-based Cluster-Contractive t-SNE for High-Dimensional Data Visualization

An Analysis of the t-SNE Algorithm for Data Visualization

HiVision: Rapid visualization of large-scale spatial vector data

A New Projection Pursuit Index for Big Data

Mesoscopic structure graphs for interpreting uncertainty in non-linear embeddings

T-Sne for Complex Multi-Manifold High-Dimensional Data

Large-Scale Time-Varying Data Volume Rendering and Feature Tracking

Joint Characterization of Multiscale Information in High Dimensional Data

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets