A Novel Framework with Information Fusion and Neighborhood Enhancement for User Identity Linkage

Siyuan Chen,Jiahai Wang,Xin Du,Yanqing Hu
DOI: https://doi.org/10.48550/arXiv.2003.07122
2020-03-16
Abstract:User identity linkage across social networks is an essential problem for cross-network data mining. Since network structure, profile and content information describe different aspects of users, it is critical to learn effective user representations that integrate heterogeneous information. This paper proposes a novel framework with INformation FUsion and Neighborhood Enhancement (INFUNE) for user identity linkage. The information fusion component adopts a group of encoders and decoders to fuse heterogeneous information and generate discriminative node embeddings for preliminary matching. Then, these embeddings are fed to the neighborhood enhancement component, a novel graph neural network, to produce adaptive neighborhood embeddings that reflect the overlapping degree of neighborhoods of varying candidate user pairs. The importance of node embeddings and neighborhood embeddings are weighted for final prediction. The proposed method is evaluated on real-world social network data. The experimental results show that INFUNE significantly outperforms existing state-of-the-art methods.
Social and Information Networks,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is user identity linkage between different social networks. Specifically, since users may have different accounts in multiple social networks, identifying the associations between these accounts is crucial for cross - network data mining. However, the correspondence of user accounts between different social networks is usually unavailable, which makes user identity linkage an important research topic. ### Main contributions of the paper 1. **Information fusion component**: An information fusion component is proposed, which can simultaneously fuse user structure, profile and content information. This is the first time to achieve the simultaneous fusion of these three types of information in an embedding - based method. 2. **Neighborhood enhancement component**: In order to utilize the information of potentially matching neighbors, a new graph neural network model is proposed to learn the neighborhood representation that varies with candidate user pairs. 3. **Experimental verification**: The effectiveness of INFUNE has been verified through extensive experiments, and the results show that INFUNE significantly outperforms the existing state - of - the - art methods. ### Method overview The INFUNE framework contains two main components: - **Information fusion component**: A set of encoders and decoders are used to fuse heterogeneous information and generate node embeddings for preliminary matching. - **Neighborhood enhancement component**: Based on the node embeddings, the potentially matching neighbors of candidate user pairs are identified, and the neighborhood embeddings that are dynamically adapted are learned through a new graph neural network model. ### Formulas and technical details 1. **Feature embedding**: \[ z_\alpha=\text{ENC}_\alpha(x) = W_{\alpha 2}\tanh(W_{\alpha 1}x + b_{\alpha 1})+b_{\alpha 2} \] where \(z_\alpha\) is the feature embedding and \(\text{ENC}_\alpha\) is the feature - specific encoder. 2. **Similarity metric**: \[ g_{ij}^\alpha=\text{sim}_\alpha(u_i, u_j) \] where \(g_{ij}^\alpha\) is the true similarity between users \(u_i\) and \(u_j\) on feature \(\alpha\). 3. **Reconstructed similarity**: \[ r_{ij}^\alpha=\text{DEC}_\alpha(z_i^\alpha, z_j^\alpha) \] where \(r_{ij}^\alpha\) is the reconstructed similarity and \(\text{DEC}_\alpha\) is the feature - specific decoder. 4. **Loss function**: \[ L_\alpha=\frac{1}{N_1N_2}\sum_{u_i\in U_1}\sum_{u_j\in U_2}\ell_\alpha(r_{ij}^\alpha, g_{ij}^\alpha) \] where \(\ell_\alpha\) is the squared - loss function. 5. **Total objective function**: \[ L_{\text{all}}=L_{\text{label}}+\sum_{\alpha\in\{s, p, c\}}L_\alpha \] 6. **Neighborhood enhancement component**: - **Potentially matching neighbors**: \[ N_i^+=\{u_n\in N_i\mid\text{Potentially matching}\} \] - **Neighborhood embedding**: \[ h_i^+=\text{GCN}(N_i^+)=\frac{1}{|N_i^+|}\sum_{u_n\in N_i^+}z_n \] - **Total neighborhood**