Convergence analysis of t-SNE as a gradient flow for point cloud on a manifold

Seonghyeon Jeong,Hau-Tieng Wu
2024-01-31
Abstract:We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
Machine Learning,Data Structures and Algorithms
What problem does this paper attempt to address?
The paper primarily focuses on the behavioral characteristics of the t-SNE (t-distributed Stochastic Neighbor Embedding) algorithm when sampling point clouds on manifolds, and specifically studies the convergence analysis of t-SNE as a gradient flow. Below is a summary of the main issues the paper attempts to address: 1. **Research Background and Motivation**: - t-SNE is a widely used nonlinear dimensionality reduction method for data visualization. - Despite its excellent performance in practice, the theoretical support for t-SNE is relatively limited, especially regarding its iterative nature. 2. **Core Issues**: - **Divergence during the Iterative Process**: Investigating whether the t-SNE algorithm produces data points that diverge to infinity when handling high-dimensional data. - **Existence of Global Minimum**: Proving whether the Kullback-Leibler (KL) divergence used by t-SNE has a global minimum under specific conditions. 3. **Main Contributions**: - **Limitation of Divergence**: Through analysis, it is proven that the embedded points generated by t-SNE are uniformly bounded in a 2-dimensional space. - **Existence of Global Minimum**: Based on the above results, it is further proven that the KL divergence has a global minimum. 4. **Technical Details**: - **Discussion of Perplexity Parameter**: The paper discusses in detail the properties of the perplexity parameter in t-SNE, including its range, uniqueness, and stability. - **Continuous Gradient Flow**: Viewing t-SNE as a continuous gradient flow, key conclusions are derived by analyzing the gradient flow equations. - **Changes in Mutual Distances**: Using the gradient flow equations and the structure of the perplexity parameter, the changes in mutual distances between embedded points are studied to infer overall behavior. 5. **Organizational Structure**: - The paper first reviews the basic principles of the t-SNE algorithm. - Then, it proposes the assumptions used for analyzing t-SNE, including assumptions about the support set of the input dataset. - Next, it delves into the properties of the perplexity parameter. - Finally, the paper elaborates on the main theorems and their proofs in detail. In summary, this paper aims to fill the gap in the theoretical foundation of the t-SNE algorithm. Through mathematical analysis, it proves key properties of the t-SNE algorithm, thereby providing a more solid theoretical support for its practical application.