Improving Open Information Extraction with Distant Supervision Learning

Han Jiabao,Wang Hongzhi
DOI: https://doi.org/10.1007/s11063-021-10548-0
IF: 2.565
2021-01-01
Neural Processing Letters
Abstract:Open information extraction (Open IE), as one of the essential applications in the area of Natural Language Processing (NLP), has gained great attention in recent years. As a critical technology for building Knowledge Bases (KBs), it converts unstructured natural language sentences into structured representations, usually expressed in the form of triples. Most conventional open information extraction approaches leverage a series of manual pre-defined extraction patterns or learn patterns from labeled training examples, which requires a large number of human resources. Additionally, many Natural Language Processing tools are involved, which leads to error accumulation and propagation. With the rapid development of neural networks, neural-based models can minimize the error propagation problem, but it also faces the problem of data-hungry in supervised learning. Especially, they leverage existing Open IE tools to generate training data, and it causes data quality issues. In this paper, we employ a distant supervision learning approach to improve the Open IE task. We conduct extensive experiments by employing two popular sequence-to-sequence models (RNN and Transformer) and a large benchmark data set to demonstrate the performance of our approach.
What problem does this paper attempt to address?