Understanding Negative Sampling in Graph Representation Learning

Zhen Yang,Ming Ding,Chang Zhou,Hongxia Yang,Jingren Zhou,Jie Tang
DOI: https://doi.org/10.1145/3394486.3403218
2020-01-01
Abstract:Graph representation learning has been extensively studied in recent years.Despite its potential in generating continuous embeddings for various networks,both the effectiveness and efficiency to infer high-quality representationstoward large corpus of nodes are still challenging. Sampling is a criticalpoint to achieve the performance goals. Prior arts usually focus on samplingpositive node pairs, while the strategy for negative sampling is leftinsufficiently explored. To bridge the gap, we systematically analyze the roleof negative sampling from the perspectives of both objective and risk,theoretically demonstrating that negative sampling is as important as positivesampling in determining the optimization objective and the resulted variance.To the best of our knowledge, we are the first to derive the theory andquantify that the negative sampling distribution should be positively butsub-linearly correlated to their positive sampling distribution. With theguidance of the theory, we propose MCNS, approximating the positivedistribution with self-contrast approximation and accelerating negativesampling by Metropolis-Hastings. We evaluate our method on 5 datasets thatcover extensive downstream graph learning tasks, including link prediction,node classification and personalized recommendation, on a total of 19experimental settings. These relatively comprehensive experimental resultsdemonstrate its robustness and superiorities.
What problem does this paper attempt to address?