Challenges Feature Extracting Feature Clustering Relation Predicting 1 , Overlapping Relations 2 , Discrete Features
Xinsong Zhang,Pengshuai Li,Weijia Jia,Hai Zhao
2018-01-01
Abstract:To disclose overlapped multiple relations from a sentence still keeps challenging. Most current works in terms of neural models inconveniently assuming that each sentence is explicitly mapped to a relation label, cannot handle multiple relations properly as the overlapped features of the relations are either ignored or very difficult to identify. To tackle with the new issue, we propose a novel approach for multi-labeled relation extraction with capsule network which acts considerably better than current convolutional or recurrent net in identifying the highly overlapped relations within an individual sentence. To better cluster the features and precisely extract the relations, we further devise attention-based routing algorithm and sliding-margin loss function, and embed them into our capsule network. The experimental results show that the proposed approach can indeed extract the highly overlapped features and achieve significant performance improvement for relation extraction comparing to the state-of-the-art works. Introduction Relation extraction plays a crucial role in many natural language processing (NLP) tasks. It aims to identify relation facts for pairs of entities in a sentence to construct triples like [Arthur Lee, place born, Memphis]. Relation extraction has received renewed interest in the neural network era, when neural models are effective to extract semantic meanings of relations. Compared with traditional approaches which focus on manually designed features, neural methods such as Convolutional Neural Network (CNN) (Liu et al. 2013; Zeng et al. 2014) and Recurrent Neural Network (RNN) (Zhang and Wang 2015; Zhou et al. 2016) have achieved significant improvement in relation classification. However, previous neural models are unlikely to scale in the scenario where a sentence has multiple relation labels and face the challenges in extracting highly overlapped and discrete relation features due to the following two drawbacks. First, one entity pair can express multiple relations in a sentence, which will confuse relation extractor seriously. For example, as in Figure 1, the entity pair [Arthur Lee, Memphis] keeps three possible relations which are place birth, ∗Corresponding authors: Weijia Jia, Hai Zhao, {jia-wj, zhaohai}@cs.sjtu.edu.cn Copyright c © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. place death and place lived. The sentence S1 and S2 can both express two relations, and the sentence S3 represents another two relations. These sentences contain multiple kinds of relation features which are difficult to be identified clearly. The existing neural models tendentiously merge low-level semantic meanings to one high-level relation representation vector with methods such as max-pooling (Zeng et al. 2014; Zhang, Zhao, and Qin 2016) and word-level attention (Zhou et al. 2016). However, one high-level relation vector is still insufficient to express multiple relations precisely. Second, current methods are neglecting of the discretization of relation features. For instance, as shown in Figure 1, all the sentences express their relations with a few significant words (labeled italic in the figure) distributed discretely in the sentences. However, common neural methods handle sentences with fixed structures, which are difficult to gather relation features of different positions. For example, being spatially sensitive, CNNs adopt convolutional feature detectors to extract local patterns from a sliding window of vector sequences and use the max-pooling to select the prominent ones. Besides, the feature distribution of “no relation (NA, others)” in a dataset is different from that of definite relations. A sentence can be classified to “no relation” only when it does not contain any features of other relations. In this paper, to extract overlapped and discrete relation features, we propose a novel approach for multi-labeled relation extraction with an attentive capsule network. As shown in Figure 1, the relation extractor of the proposed method is constructed with three major layers that are feature extracting, feature clustering and relation predicting. The first one extracts low-level semantic meanings. The second layer clusters low-level features to high-level relation representations, and the final one predicts relation types for each relation representation. The low-level features are extracted with traditional neural models such as Bidirectional Long ShortTerm Memory (Bi-LSTM) and CNN. For the feature clustering layer, we utilize an attentive capsule network inspired by Sabour, Frosst, and Hinton (2017). Capsule (vector) is a small group of neurons used to express features. Its overall length indicates the significance of features, and the direction of a capsule suggests the specific property of the feature. The low-level semantic meanings from the first layer are embedded to amounts of low-level capsules, which will ar X iv :1 81 1. 04 35 4v 1 [ cs .C L ] 1 1 N ov 2 01 8 ID Instances Relations S1 [Arthur Lee], the leader of Love, died on Thursday in [Memphis]. person/place_death