Abstract:Combining topological information and attributed information of nodes in networks effectively is a valuable task in network embedding. Nevertheless, many prior network embedding methods regarded attributed information of nodes as simple attribute sets or ignored them totally. In some scenarios, the hidden information contained in vertex attributes are essential to network embedding. For instance, networks that contain vertexes with text information play an increasingly important role in our life, including citation networks, social networks, and entry networks. In these textual networks, the latent topic relevance information of different vertexes contained in textual attributes information are valuable in the network analysis process. Shared latent topics of nodes in networks may influence the interaction between them, which is critical to network embedding. However, much prior work for textual network embedding only regarded the text information as simple word sets while ignored the embedded topic information. In this paper, we develop a model named Topical Adversarial Capsule Network (TACN) for textual network embedding, which extracts a low-dimensional latent space of the original network from node structures, vertex attributes, and topic information contained in text of nodes. The proposed TACN contains three parts. The first part is an embedding model, which extracts the embedding representation from the topological structure, vertex attributes, and document-topic distributions. To ensure a consistent training process by back-propagation, we generate document-topic distributions by the neural topic model with Gaussian Softmax constructions. The second part is a prediction model, which is used to exploit labels of vertices. In the third part, an adversarial capsule model is used to help distinguish the latent representations from node structure domain, vertex attribute domain, or document-topic distribution domain. The latent representations, which may come from the three domains, are the output of the embedding model. We incorporate the adversarial idea into the adversarial capsule model to combine the information from these three domains, rather than to distinguish the representations conventionally. Experiments on seven real-world datasets validate the effectiveness of our method.

Topical Paragraph Vector learning

Knowledge-based Document Embedding for Cross-Domain Text Classification

Topical Word Embeddings

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Generative Paragraph Vector

Vector-Quantization-Based Topic Modeling

Analysis of the Paragraph Vector Model for Information Retrieval.

Topic Modeling Using Distributed Word Embeddings

Topic2Vec: Learning distributed representations of topics

Sentence Vector Model Based on Implicit Word Vector Expression

A Study of Text Vectorization Method Combining Topic Model and Transfer Learning

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

TACN: A Topical Adversarial Capsule Network for textual network embedding

Category Enhanced Word Embedding.

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Spherical Paragraph Model.

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Multi-view visual semantic embedding for cross-modal image–text retrieval

Improving Language Estimation with the Paragraph Vector Model for Ad-Hoc Retrieval

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings