Using Distant Supervision and Paragraph Vector for Large Scale Relation Extraction

Yuming Liu,Weiran Xu
DOI: https://doi.org/10.1007/978-981-10-0457-5_17
2016-01-01
Abstract:Distant supervision has the ability to generate a huge amount training data. Recently, the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction. But multi-instance multi-label learning only uses hidden variables when inference relation between entities, which could not make full use of training data. Besides, traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence, which limits the system's performance. This paper presents a novel approach for multi-instance multilabel learning, which takes the idea of fuzzy classification. We use cluster center as train-data and in this way we can adequately utilize sentence-level features. Meanwhile, we extend feature set by paragraph vector, which carries semantic information of sentences. We conduct an extensive empirical study to verify our contributions. The result shows our method is superior to the state-of-the-art distant supervised baseline.
What problem does this paper attempt to address?