Attributed Network Embedding in Streaming Style

Anbiao Wu,Ye Yuan,Changsheng Li,Yuliang Ma,Hao Zhang
DOI: https://doi.org/10.1109/icde60146.2024.00243
2024-01-01
Abstract:Attributed network embedding (ANE) can learn low-dimensional embeddings for nodes in attributed graphs, which can facilitate several data analysis tasks. However, the existing ANE methods fail to tackle scenarios involving the continuous generation of attributes. The ongoing generation of attributes accumulates numerous attributes, incurring high storage costs in existing methods. Furthermore, due to storage limitations, old attributes will be discarded as new ones are generated, existing methods struggle to integrate the new attribute information into embeddings generated from old attributes. Therefore, we propose a novel ANE framework named SANE (Streaming-style ANE), featuring a “memory” capability - that is, when updating the embeddings for new attributes, old attribute information can be partly preserved. In SANE, we first define forward and backward affinity between nodes and attributes by reviewing a node as source or target node. The definition guides quick computation of affinity vectors that integrate both topological and attribute information. Meanwhile, we propose an augmentation strategy to enrich node attribute information for enhance the quality of node embeddings. Leveraging the augmented attributes, we iteratively generate forward and backward affinity vectors, providing quantification of node-attribute affinity in two directions. Subsequently, we achieve a streaming-style update of node embeddings by employing matrix sketching technology on these iteratively generated vectors. Furthermore, capitalizing on the mergeability of matrix sketching, we efficiently integrate information of new generated attributes into node embeddings. Extensive experiments on 5 real datasets demonstrate that SANE surpasses the state-of-the-art algorithms in node classification and link prediction. SANE's ability to incorporate new attribute information into embeddings in a fast manner is validated through adequate simulation experiments.
What problem does this paper attempt to address?