A Survey on Network Embedding

Peng Cui,Xiao Wang,Jian Pei,Wenwu Zhu
DOI: https://doi.org/10.48550/arXiv.1711.08752
2017-11-23
Abstract:Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.
Social and Information Networks
What problem does this paper attempt to address?
This paper aims to address several key challenges in network data representation, which impede the effective large - scale network processing and analysis. Specifically, the paper attempts to solve the following problems: 1. **High computational complexity**: In traditional network representations, the relationships between nodes are encoded by the edge set \(E\), which results in most network processing or analysis algorithms requiring iterative or combinatorial computational steps, thus leading to high computational complexity. For example, calculating the shortest path length or the average path length between two nodes requires enumerating many possible paths, which is a combinatorial problem. Moreover, methods for evaluating node importance usually require iteratively performing a random node traversal process until convergence, which also leads to high computational complexity. 2. **Low parallelism**: Parallel and distributed computing are the de facto standards for processing and analyzing large - scale data. However, there are serious difficulties in designing and implementing parallel and distributed algorithms for network data in the traditional representation. The coupling relationships between nodes (explicitly reflected by \(E\)) make the communication cost between servers very high when different nodes are assigned to different shards or servers, limiting the speedup ratio. 3. **Inapplicability of machine learning methods**: In recent years, machine learning methods (especially deep learning) have shown great capabilities in many fields, providing standard, general, and effective solutions. However, for network data in the traditional representation, most off - the - shelf machine learning methods may not be applicable. These methods usually assume that data samples can be represented as independent vectors in a vector space, while there is a certain degree of dependency between nodes in network data, which is determined by \(E\). Although a node can be simply represented by its corresponding row vector in the network adjacency matrix, the high - dimensionality of this representation in large graphs makes subsequent network processing and analysis difficult. To address these problems, the paper focuses on network embedding, an emerging network analysis paradigm. The goal of network embedding is to learn low - dimensional vector representations of network nodes, preserve the relationships between nodes in the embedding space, and encode the topological and structural features of nodes. In this way, network embedding can support subsequent network processing and analysis tasks, such as node classification, node clustering, network visualization, and link prediction. The main contributions of the paper include: - **Review of network embedding methods**: Classify and review existing network embedding methods, covering structure - and - property - preserving network embedding methods, network embedding methods with auxiliary information, and advanced - information - preserving network embedding methods. - **Evaluation methods and resources**: Introduce several evaluation methods for network embedding and some useful online resources, including network datasets and software. - **Future research directions**: Discuss how to use network embedding methods to build effective systems and point out some potential future research directions.