Supervised Stochastic Neighbor Embedding Using Contrastive Learning

Yi Zhang
DOI: https://doi.org/10.48550/arXiv.2309.08077
2023-09-15
Abstract:Stochastic neighbor embedding (SNE) methods $t$-SNE, UMAP are two most popular dimensionality reduction methods for data visualization. Contrastive learning, especially self-supervised contrastive learning (SSCL), has showed great success in embedding features from unlabeled data. The conceptual connection between SNE and SSCL has been exploited. In this work, within the scope of preserving neighboring information of a dataset, we extend the self-supervised contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of samples belonging to the same class are pulled together in low-dimensional embedding space, while simultaneously pushing apart clusters of samples from different classes.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to extend the idea of self - supervised contrastive learning (SSCL) to the fully - supervised scenario in order to improve the existing Stochastic Neighbor Embedding (SNE) methods. Specifically, the author hopes to effectively utilize label information during the dimensionality reduction process, so that samples belonging to the same category are clustered together in the low - dimensional embedding space, while separating sample clusters of different categories. ### Main Problems and Goals 1. **Combining SNE and Contrastive Learning**: - Traditional SNE methods such as t - SNE and UMAP perform well in data visualization, but they mainly rely on an unsupervised way for dimensionality reduction. - Contrastive learning (especially self - supervised contrastive learning) has achieved great success in feature embedding, especially when dealing with unlabeled data. 2. **Introducing Supervision Information**: - By introducing supervision information (i.e., labels), the dimensionality reduction effect can be further improved, especially in multi - class datasets. - The paper proposes a supervised contrastive SNE method, aiming to fully utilize label information to optimize the embedding results. 3. **Unified Framework**: - Propose a unified PyTorch framework that can implement (non - ) parameterized supervised and unsupervised contrastive SNE methods. - This framework allows the re - implementation of existing SNE methods and can be adjusted by modifying the loss function to adapt to different application scenarios. ### Specific Contributions - **Unified PyTorch Framework**: Applicable to (non - ) parameterized supervised and unsupervised contrastive SNE methods. - **General Loss Function**: Provides a unified loss function that can be used for supervised and unsupervised learning. - **Analysis Results**: Provides detailed theoretical analysis and experimental verification, demonstrating the effectiveness and superiority of the new method. ### Summary of Mathematical Formulas To better understand the method in this paper, the following are the Markdown representations of several key formulas: 1. **High - Dimensional Similarity Distribution**: \[ \text{sim}(x_i, x_j) = p_{ij} = \frac{\mathbb{I}_{ij \in P(ij)}}{|P|} \] where \( P \) is the set of all positive sample pairs, and \(\mathbb{I}\) is the indicator function. 2. **Low - Dimensional Similarity Distribution**: \[ \text{sim}(z_i, z_j) = q_{\theta, ij} = \phi_{ij} \] where \(\phi(d_{ij}) = \frac{1}{d_{ij}^2 + 1}\) is the Cauchy kernel. 3. **t - SNE's Loss Function**: \[ L_{t - SNE}^\theta = -\mathbb{E}_{ij \sim p} \log q_{\theta, ij} = -\sum_{ij \in P} \log \phi_{ij} + \log \left( \sum_{kl \in P} \phi_{kl} \right) \] 4. **Supervised Contrastive Loss Function**: \[ L_{\text{Contrastive}}^\theta = -\frac{1}{|B|} \sum_{i \in B} \left( \frac{1}{|\tilde{P}|} \sum_{ij \in \tilde{P}} \log \frac{\exp (\text{sim}(z_i, z_j) / \tau)}{\sum_{ik \in N} \exp (\text{sim}(z_i, z_k) / \tau)} \right) \] Through these formulas, the paper describes in detail how to apply the idea of contrastive learning to the SNE method in the supervised scenario.