Navigating Perplexity: A linear relationship with the data set size in t-SNE embeddings

Martin Skrodzki,Nicolas F. Chaves-de-Plaza,Thomas Höllt,Elmar Eisemann,Klaus Hildebrandt
2024-12-05
Abstract:Widely used pipelines for analyzing high-dimensional data utilize two-dimensional visualizations. These are created, for instance, via t-distributed stochastic neighbor embedding (t-SNE). A crucial element of the t-SNE embedding procedure is the perplexity hyperparameter. That is because the embedding structure varies when perplexity is changed. A suitable perplexity choice depends on the data set and the intended usage for the embedding. Therefore, perplexity is often chosen based on heuristics, intuition, and prior experience. This paper uncovers a linear relationship between perplexity and the data set size. Namely, we show that embeddings remain structurally consistent across data set samples when perplexity is adjusted accordingly. Qualitative and quantitative experimental results support these findings. This informs the visualization process, guiding the user when choosing a perplexity value. Finally, we outline several applications for the visualization of high-dimensional data via t-SNE based on this linear relationship.
Machine Learning,Artificial Intelligence,Quantitative Methods
What problem does this paper attempt to address?