Citation Metadata Extraction Via Deep Neural Network-based Segment Sequence Labeling

Dong An,Liangcai Gao,Zhuoren Jiang,Runtao Liu,Zhi Tang
DOI: https://doi.org/10.1145/3132847.3133074
2017-01-01
Abstract:Citation metadata extraction plays an important role in academic information retrieval and knowledge management. Current works on this task generally use rule-based, template-based or learning-based approaches but these methods usually either rely on handcrafted features or are limited with domains. Recently, neural networks have shown strong ability in addressing sequence labeling tasks. In this paper, we propose a sequence labeling model for citation metadata extraction, called segment sequence labeling. Instead of inferring at word level, the input sequence is first divided into segments, and then features of the segments are computed to infer the label sequence of the segments. We first run experiments to validate the effectiveness of different parts of the model by comparing it with a CRF-based model and a neural network-based model. Experimental results show our model beats both models on most fields. Besides, our model is evaluated on public datasets UMass [1] and Cora [12] and has achieved significant performance improvement. Our model was trained on the data which were generated from BibTeX files collected on the Web and annotated automatically.
What problem does this paper attempt to address?