Uncovering multi-order Popularity and Similarity Mechanisms in Link Prediction by graphlet predictors

Yong-Jian He,Yijun Ran,Zengru Di,Tao Zhou,Xiao-Ke Xu
2024-10-07
Abstract:Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to construct multi-order popularity- and similarity-based network link predictors, demonstrating that traditional popularity- and similarity-based indices can be efficiently represented in terms of orbit degrees. Moreover, we designed a supervised learning model that fuses multiple orbit-degree-based features and validated its link prediction performance. We also evaluated the mean absolute Shapley additive explanations of each feature within this model across 550 real-world networks from six domains. We observed that the homophily mechanism, which is a similarity-based feature, dominated social networks, with its win rate being 91\%. Moreover, a different similarity-based feature was prominent in economic, technological, and information networks. Finally, no single feature dominated the biological and transportation networks. The proposed approach improves the accuracy and interpretability of link prediction, thus facilitating the analysis of complex networks.
Social and Information Networks,Physics and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to more accurately understand and utilize multi - order popularity and similarity mechanisms to improve the accuracy of link prediction. Specifically, the author aims to explore the roles of these mechanisms in networks of different fields and proposes a new method based on graph - structure features to improve the performance of existing link - prediction models. ### Problem Background Link prediction is a key issue in network science. Its goal is to infer whether there are potential links between node pairs through partial network - structure information. This problem not only helps to evaluate and compare network - evolution models, but also deepens the understanding of the organizing principles of complex networks. Therefore, link prediction has wide applications in multiple fields, such as friend recommendation in social networks, product recommendation in e - commerce websites, and guidance in biological experiments. ### Research Motivation Existing link - prediction methods mainly rely on two basic mechanisms: the **popularity mechanism** and the **similarity mechanism**. However, the specific roles of these mechanisms in networks of different fields are still unclear. In addition, although some studies have proposed higher - order information for link prediction, the role of this information and its relationship with lower - order information are still unclear. ### Paper Contributions 1. **Propose a New Framework**: The author proposes a link - prediction framework based on graph - structure features, using the orbit degrees of graphlets to quantify multi - order popularity and similarity features. 2. **Integrate Machine Learning**: Integrate these features into a supervised - learning model and verify its superior performance in the link - prediction task. 3. **Reveal the Roles of Mechanisms**: By analyzing the importance of features, reveal the specific roles of multi - order popularity and similarity mechanisms in different networks. The research shows that lower - order similarity features play a major role in most networks, while higher - order features play a supplementary role. 4. **Cross - Domain Applicability**: This framework is not limited to link prediction, but can also be used for other key issues in network science, such as identifying important nodes and analyzing network - propagation dynamics. ### Main Findings - **Lower - Order Features Are Dominant**: Lower - order similarity features (such as M2) play a dominant role in most networks, especially in social networks, which reflects the importance of the homophily mechanism. - **Higher - Order Features Are Supplementary**: Higher - order similarity and multi - order popularity features play a supplementary role in technology, economic, and information networks. - **Significant Performance Improvement**: The proposed orbit - degree - based model performs significantly better than other benchmark methods on 550 real - world networks, especially in terms of metrics such as AUC, recall rate, and F1 - score. In conclusion, this paper successfully improves the accuracy and interpretability of link prediction by introducing graph - structure features and machine - learning methods, and provides new perspectives and tools for complex - network analysis.