Attributed Network Embedding with Micro-Meso Structure

Juan-Hui Li,Ling Huang,Chang-Dong Wang,Dong Huang,Jian-Huang Lai,Pei Chen
DOI: https://doi.org/10.1145/3441486
IF: 4.157
2021-08-31
ACM Transactions on Knowledge Discovery from Data
Abstract:Recently, network embedding has received a large amount of attention in network analysis. Although some network embedding methods have been developed from different perspectives, on one hand, most of the existing methods only focus on leveraging the plain network structure, ignoring the abundant attribute information of nodes. On the other hand, for some methods integrating the attribute information, only the lower-order proximities (e.g., microscopic proximity structure) are taken into account, which may suffer if there exists the sparsity issue and the attribute information is noisy. To overcome this problem, the attribute information and mesoscopic community structure are utilized. In this article, we propose a novel network embedding method termed Attributed Network Embedding with Micro-Meso structure, which is capable of preserving both the attribute information and the structural information including the microscopic proximity structure and mesoscopic community structure. In particular, both the microscopic proximity structure and node attributes are factorized by Nonnegative Matrix Factorization (NMF), from which the low-dimensional node representations can be obtained. For the mesoscopic community structure, a community membership strength matrix is inferred by a generative model (i.e., BigCLAM) or modularity from the linkage structure, which is then factorized by NMF to obtain the low-dimensional node representations. The three components are jointly correlated by the low-dimensional node representations, from which two objective functions (i.e., ANEM_B and ANEM_M) can be defined. Two efficient alternating optimization schemes are proposed to solve the optimization problems. Extensive experiments have been conducted to confirm the superior performance of the proposed models over the state-of-the-art network embedding methods.
computer science, information systems, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simultaneously preserve node attribute information, micro - proximity structure and meso - community structure in network embedding. Most of the existing network embedding methods mainly focus on using pure network structures and ignore the rich attribute information of nodes; while some methods that attempt to integrate attribute information often only consider low - order proximity (such as micro - proximity structure), which may be affected when facing sparsity problems and noisy attribute information. To solve these problems, this paper proposes a new network embedding method - Attributed Network Embedding with Micro - Meso Structure (ANEM), aiming to be able to preserve the attribute information and structural information of nodes simultaneously, including micro - proximity structure and meso - community structure. Specifically, the paper solves the problem through the following points: 1. **Micro - proximity structure**: Consider the first - order and second - order proximities of nodes, construct the proximity matrix \( S \), and decompose it into low - dimensional node representations through Nonnegative Matrix Factorization (NMF). 2. **Meso - community structure**: Use a generative model (such as BigCLAM) or modularity to infer the community member strength matrix from the connection structure of the network, and then obtain low - dimensional node representations through NMF. 3. **Node attributes**: By introducing a non - negative basis matrix \( N \), use the NMF framework to approximate the node attribute matrix \( D \), thereby obtaining low - dimensional node representations. By jointly optimizing the above three components, two objective functions (ANEM_B and ANEM_M) are defined to capture meso - community structure information based on BigCLAM and modularity respectively. Finally, two effective alternating optimization schemes are proposed to solve the optimization problem. Experimental results show that the proposed ANEM_B and ANEM_M methods outperform most existing network embedding methods in node classification and clustering tasks.