COTET: Cross-view Optimal Transport for Knowledge Graph Entity Typing

Zhiwei Hu,Víctor Gutiérrez-Basulto,Zhiliang Xiang,Ru Li,Jeff Z. Pan
2024-05-22
Abstract:Knowledge graph entity typing (KGET) aims to infer missing entity type instances in knowledge graphs. Previous research has predominantly centered around leveraging contextual information associated with entities, which provides valuable clues for inference. However, they have long ignored the dual nature of information inherent in entities, encompassing both high-level coarse-grained cluster knowledge and fine-grained type knowledge. This paper introduces Cross-view Optimal Transport for knowledge graph Entity Typing (COTET), a method that effectively incorporates the information on how types are clustered into the representation of entities and types. COTET comprises three modules: i) Multi-view Generation and Encoder, which captures structured knowledge at different levels of granularity through entity-type, entity-cluster, and type-cluster-type perspectives; ii) Cross-view Optimal Transport, transporting view-specific embeddings to a unified space by minimizing the Wasserstein distance from a distributional alignment perspective; iii) Pooling-based Entity Typing Prediction, employing a mixture pooling mechanism to aggregate prediction scores from diverse neighbors of an entity. Additionally, we introduce a distribution-based loss function to mitigate the occurrence of false negatives during training. Extensive experiments demonstrate the effectiveness of COTET when compared to existing baselines.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue in the Knowledge Graph Entity Typing (KGET) task, which involves inferring missing entity type instances in a knowledge graph. Although existing research primarily focuses on utilizing contextual information related to entities to infer types, these methods have long overlooked the inherent dual information nature within entities, including high-level coarse-grained clustering knowledge and fine-grained type knowledge. Therefore, the paper proposes a novel approach—Cross-View Optimal Transport (COTET)—to effectively integrate type clustering information into the representations of entities and types. ### Background and Motivation Knowledge Graphs (KGs) represent factual knowledge through triples (e, r, f), where entities e and f are connected by a relation type r. Additionally, KGs contain entity type assertions (e, hastype, t), indicating that entity e has type t. For example, the entity Lionel Messi has types including Argentine footballer and Inter Miami CF player. Entity type information is crucial in many applications, such as knowledge graph completion, question answering systems, and entity alignment. However, existing knowledge graphs like FB15k and YAGO43k, although rich in type knowledge, still have significantly incomplete coverage. For instance, Lionel Messi should also have the type FC Barcelona player, but this information is missing in the actual data. ### Limitations of Existing Methods - **Embedding Methods**: Encode all neighbors of the target entity into a single vector, but often only a subset of neighbors is necessary for correct type inference. - **Graph Neural Networks (GNNs) Methods**: Represent information based on neighbors but primarily aggregate information along paths starting from the target entity's neighbors, making it difficult to capture interactions between non-directly connected neighbors. - **Transformer Methods**: Rely on computationally intensive Transformer structures to encode entities and their relations, as well as known type neighbors to infer missing types. More importantly, all existing methods consider knowledge from a single perspective. ### Contributions of the Paper 1. **Multi-View Generation and Encoding**: Introduces entity-type view, entity-cluster view, and type-cluster view to generate and encode knowledge from different levels of abstraction. 2. **Cross-View Optimal Transport**: Aligns embeddings from different perspectives into a unified space by minimizing the Wasserstein distance between different distributions. 3. **Hybrid Pooling Strategy**: Combines prediction scores from different neighbors to form the final result. 4. **Distribution-Based Loss Function**: Designs a cross-entropy loss function based on the Beta probability distribution to mitigate false negative issues during training. ### Experimental Validation The paper validates the effectiveness of COTET through extensive experiments on two real-world knowledge graphs and conducts detailed ablation studies. Experimental results show that COTET outperforms existing baseline methods across multiple metrics.