ZoomNet for Topic-Oriented Fragment Recognition in Long Documents

Yukun Yan,Daqi Zheng,Zhengdong Lu,Sen Song
DOI: https://doi.org/10.1109/access.2022.3166235
IF: 3.9
2022-01-01
IEEE Access
Abstract:This work introduces a new information extraction task called Topic-Oriented Fragment Recognition (TOFR), whose goal is to recognize information related to a specific topic in long documents from professional fields. In this paper, we introduce two TOFR datasets to study the problems of processing long documents. We propose a novel neural framework named Zooming Network (ZoomNet), which overcomes the challenge of combining information over long distances with limited computing resources by flexibly switching between skimming and intensive reading in processing long documents. In general, ZoomNet first establishes a hierarchical representation aligned to the text structure, which relieves the conflict between local information and extensive contextual information. Then, it synthesizes different levels of information to assign tags via multi-scale actions. We combine supervised and reinforcement learning methods to train our model. Experiments show that the proposed model outperforms several state-of-the-art sequence labeling models, including BiLSTM-CRF, BERT, XLNET, RoBERTa, and ELECTRA, on both TOFR datasets with big margins.
What problem does this paper attempt to address?