H2CGL: Modeling Dynamics of Citation Network for Impact Prediction

Guoxiu He,Zhikai Xue,Zhuoren Jiang,Yangyang Kang,Star Zhao,Wei Lu
2023-10-15
Abstract:The potential impact of a paper is often quantified by how many citations it will receive. However, most commonly used models may underestimate the influence of newly published papers over time, and fail to encapsulate this dynamics of citation network into the graph. In this study, we construct hierarchical and heterogeneous graphs for target papers with an annual perspective. The constructed graphs can record the annual dynamics of target papers' scientific context information. Then, a novel graph neural network, Hierarchical and Heterogeneous Contrastive Graph Learning Model (H2CGL), is proposed to incorporate heterogeneity and dynamics of the citation network. H2CGL separately aggregates the heterogeneous information for each year and prioritizes the highly-cited papers and relationships among references, citations, and the target paper. It then employs a weighted GIN to capture dynamics between heterogeneous subgraphs over years. Moreover, it leverages contrastive learning to make the graph representations more sensitive to potential citations. Particularly, co-cited or co-citing papers of the target paper with large citation gap are taken as hard negative samples, while randomly dropping low-cited papers could generate positive samples. Extensive experimental results on two scholarly datasets demonstrate that the proposed H2CGL significantly outperforms a series of baseline approaches for both previously and freshly published papers. Additional analyses highlight the significance of the proposed modules. Our codes and settings have been released on Github (<a class="link-external link-https" href="https://github.com/ECNU-Text-Computing/H2CGL" rel="external noopener nofollow">this https URL</a>)
Digital Libraries,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of predicting the potential impact of academic papers. Specifically: 1. **Limitations of Existing Models**: - Most existing models underestimate the growing impact of newly published papers over time and fail to incorporate the temporal dynamics of citation networks. 2. **Constructing Hierarchical Heterogeneous Graphs**: - A method is proposed to construct hierarchical heterogeneous graphs for the target paper to record dynamic changes each year. These graphs can capture changes in the scientific background information of the target paper. 3. **Proposing a New Model H2CGL**: - A new graph neural network model, H2CGL, is introduced to integrate the heterogeneity and dynamics in citation networks. H2CGL aggregates heterogeneous information annually and prioritizes highly cited papers and their relationships. 4. **Enhancing Representations with Contrastive Learning**: - Contrastive learning is utilized to make graph representations more sensitive to potential citations. By selecting co-cited or mutually cited papers of the target paper as hard negative samples, and randomly discarding low-cited papers to generate positive samples. 5. **Experimental Validation**: - Extensive experiments are conducted on two academic datasets, showing that H2CGL significantly outperforms baseline methods for both newly published and older papers. Further analysis demonstrates the importance of the proposed modules. ### Summary The paper aims to improve existing models in predicting the potential citation count of papers by constructing hierarchical heterogeneous graphs and combining structural and temporal features, thereby more accurately assessing the impact of academic papers.