Feature Engineering and Tree Modeling for Author-Paper Identification Challenge

Jiefei Li,Xiaocong Liang,Weijie Ding,Weidong Yang,Rong Pan
DOI: https://doi.org/10.1145/2517288.2517294
2013-01-01
Abstract:The ability to search literature and collect/aggregate metrics around publications is a central tool for modern research. Both academic and industry researchers across hundreds of scientific disciplines, from astronomy to zoology, increasingly rely on search to understand what has been published and by whom. Microsoft Academic Search is an open platform, which provides a variety of metrics and experiences for the research community, in addition to literature search. As the covering data came from many sources, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to others. KDD Cup 2013 Track 1 challenges participants to determine which papers in an author profile were truly written by the given author. In this work, we present how to use tree-base models to accurately predict the paper author. We incorporate feature engineering into the models with the advantages of them. This paper introduces two kinds of tree-base models (GB-DT [4], RGF [5]) and presents in detail the learning algorithm and how features can be generated for the task. The experimental results show the effectiveness of the proposed approach.
What problem does this paper attempt to address?