UniNet: Scalable Network Representation Learning with Metropolis-Hastings Sampling

Xingyu Yao,Yingxia Shao,Bin Cui,Lei Chen
DOI: https://doi.org/10.1109/icde51399.2021.00051
2020-01-01
Abstract:Network representation learning (NRL) has been successfully adopted in various data mining and machine learning applications. Random walk based NRL is one popular paradigm, which uses a set of random walks to capture the network structural information, and then employs word2vec models to learn the low-dimensional representations. However, until now there is lack of a framework, which unifies existing random walk based NRL models and efficiently learns from large networks. The main obstacle comes from the diverse random walk models and the inefficient sampling method for the random walk generation. In this paper, we first introduce a new and efficient edge sampler based on Metropolis-Hastings sampling technique, and theoretically show the convergence property of the edge sampler to arbitrary discrete probability distributions. Then we propose a random walk model abstraction, in which users can easily define different transition probability by specifying dynamic edge weights and random walk states. The abstraction is efficiently supported by our edge sampler, since our sampler can draw samples from unnormalized probability distribution in constant time complexity. Finally, with the new edge sampler and random walk model abstraction, we carefully implement a scalable NRL framework called UniNet. We conduct extensive experiments with five random walk based NRL models over eleven real-world datasets, and the results verify the efficiency of UniNet over billion-edge networks.
What problem does this paper attempt to address?