The Solution for The PST-KDD-2024 OAG-Challenge

Shupeng Zhong,Xinger Li,Shushan Jin,Yang Yang
2024-07-02
Abstract:In this paper, we introduce the second-place solution in the KDD-2024 OAG-Challenge paper source tracing track. Our solution is mainly based on two methods, BERT and GCN, and combines the reasoning results of BERT and GCN in the final submission to achieve complementary performance. In the BERT solution, we focus on processing the fragments that appear in the references of the paper, and use a variety of operations to reduce the redundant interference in the fragments, so that the information received by BERT is more refined. In the GCN solution, we map information such as paper fragments, abstracts, and titles to a high-dimensional semantic space through an embedding model, and try to build edges between titles, abstracts, and fragments to integrate contextual relationships for judgment. In the end, our solution achieved a remarkable score of 0.47691 in the competition.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issue of tracing source papers in academic literature and participated in the KDD-2024 OAG-Challenge competition's paper source tracking track, ultimately achieving 2nd place. Specifically, the paper attempts to solve the following problems: 1. **How to identify source papers that have inspirational contributions to the current paper from a large number of documents**: With the rapid development of science and technology, the number of academic papers has increased dramatically, making it very difficult to find source papers that have significantly influenced the ideas or methods of a particular paper from a large number of documents. 2. **How to efficiently and accurately identify source papers**: To address this challenge, researchers need to design methods or algorithms that can effectively and accurately identify source papers. To solve the above problems, the authors adopted two main methods: a BERT-based text classification method and a Graph Convolutional Network (GCN)-based node classification method, and integrated the results of these two methods to achieve a complementary effect. Specifically: - **BERT-based text classification method**: By processing fragments appearing in the references and using various operations to reduce redundant interference, the BERT model can receive more refined information, thereby better understanding the contextual environment of the references in the original text. - **GCN-based node classification method**: Maps information such as paper fragments, abstracts, and titles into a high-dimensional semantic space, and attempts to construct edges between titles, abstracts, and fragments to integrate contextual relationships for judgment. In the end, the solution proposed by the authors achieved remarkable results in the competition, with an AP value of 0.47691.