Abstract:Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.

What problem does this paper attempt to address?

The paper mainly addresses the following issues: ### Research Background and Objectives - **Background**: Embedding learning is a powerful technique that transforms discrete data entities into continuous numerical representations to encode the features or attributes of data entities. Although different embedding learning algorithms perform well in various tasks, there is relatively little work on explaining how these algorithms encode features in the learned embedding space. - **Objective**: This paper aims to propose a hierarchical embedding exploration algorithm `EmbeddingTree` and an interactive visualization tool to address the interpretability of embedding representations. Specifically, this method can relate the semantic relationships between data entity features and their corresponding, harder-to-interpret embedding vectors. ### Problems Addressed 1. **Structurally explaining feature encoding in embeddings**: Through `EmbeddingTree`, the authors aim to structurally explain how features are encoded in the learned embedding space. 2. **Improving the interpretability of embeddings**: By developing an interactive visualization tool, it helps users discover subtle features of data entities, perform feature denoising/injection during embedding training, and generate embedding representations for unseen data entities. 3. **Hierarchical feature exploration**: For cases where the importance of features in certain datasets varies and should be explored hierarchically, a hierarchical exploration method is proposed, where features form a nested structure, and users can explore from top to bottom. 4. **Handling feature and embedding inconsistency**: By constructing `EmbeddingTree`, it is possible to analyze the hierarchical importance of data entity features in embeddings, thereby discovering potential inconsistencies between features and embeddings. ### Main Contributions 1. **Proposed an algorithm based on Gaussian Mixture Model (GMM)** to extract feature hierarchies from high-dimensional embeddings. 2. **Developed a visual analysis tool** to help users effectively explore embedding data based on the extracted hierarchies. 3. **Case studies**: Demonstrated the effectiveness of the `EmbeddingTree` algorithm and its visualization tool, including studies on merchant embeddings from credit card transaction data and user/track embeddings from the public 30Music dataset. 4. **Application scenarios**: For example, for new merchants, even with limited historical information, `EmbeddingTree` can quickly find the most similar merchant groups to initialize their embedding representations. In summary, the main goal of this paper is to improve the interpretability of embedding representations, especially in scenarios with obvious feature hierarchies, by proposing novel methods and techniques to achieve this goal.

EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

eTREE: Learning Tree-structured Embeddings

ExploreTree: Interactive Tree Modeling in Semantic Trait Space with Online Intent Learning

ExplorerTree: a focus+context exploration approach for 2D embeddings

Knowledge Graph Embedding with Diversity of Structures

EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection

Graph Exploration with Embedding-Guided Layouts

TEM: Tree-enhanced Embedding Model for Explainable Recommendation.

Exploring Multi-dimensional Data Via Subset Embedding

Exploring Evolution of Dynamic Networks Via Diachronic Node Embeddings

Knowledge Graph Embedding for Hierarchical Entities Based on Auto-Embedding Size

E-Embed: A time series visualization framework based on earth mover's distance.

Leveraging graph-based hierarchical medical entity embedding for healthcare applications

Hierarchical Feature Embedding for Visual Tracking

Exploring Multi‐dimensional Data via Subset Embedding

Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies

Dissecting embedding method: learning higher-order structures from data

Graph Embedding with Hierarchical Attentive Membership

Embedding Hierarchical Tree Structure of Concepts in Knowledge Graph Embedding