HybridFC: A Hybrid Fact-Checking Approach for Knowledge Graphs

Umair Qudus,Michael Roeder,Muhammad Saleem,Axel-Cyrille Ngonga Ngomo
DOI: https://doi.org/10.1007/978-3-031-19433-7_27
2024-09-11
Abstract:We consider fact-checking approaches that aim to predict the veracity of assertions in knowledge graphs. Five main categories of fact-checking approaches for knowledge graphs have been proposed in the recent literature, of which each is subject to partially overlapping limitations. In particular, current text-based approaches are limited by manual feature engineering. Path-based and rule-based approaches are limited by their exclusive use of knowledge graphs as background knowledge, and embedding-based approaches suffer from low accuracy scores on current fact-checking tasks. We propose a hybrid approach -- dubbed HybridFC -- that exploits the diversity of existing categories of fact-checking approaches within an ensemble learning setting to achieve a significantly better prediction performance. In particular, our approach outperforms the state of the art by 0.14 to 0.27 in terms of Area Under the Receiver Operating Characteristic curve on the FactBench dataset. Our code is open-source and can be found at <a class="link-external link-https" href="https://github.com/dice-group/HybridFC" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Databases
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of fact verification in knowledge graphs (KGs). Specifically, the authors focus on how to predict the truth or falsehood of statements in knowledge graphs. The five main existing fact - verification methods (text - based, path - based, rule - based, embedding - based, and hybrid methods) all have certain limitations: 1. **Text - based methods**: They rely on manual feature engineering, which is time - consuming and may perform poorly in prediction performance. 2. **Path - based methods**: They are limited by the availability of short paths between entities in the knowledge graph, and cannot effectively verify some correct statements with unobvious paths. 3. **Rule - based methods**: They are limited by the existing knowledge in the knowledge graph, and mining rules from large - scale knowledge graphs is very time - consuming. 4. **Embedding - based methods**: They have lower accuracy in fact - verification tasks and poor scalability on large - scale knowledge graphs. To solve these problems, the authors propose a new hybrid fact - verification method - **HybridFC**. This method achieves significantly better prediction performance in an ensemble - learning framework by combining the advantages of existing fact - verification methods of different categories. Specifically, the area under the ROC curve (AUROC) of HybridFC on the FactBench dataset is improved by 0.14 to 0.27 compared with existing methods. ### How HybridFC Works HybridFC mainly consists of the following components: 1. **Text - based component**: It uses a sentence - embedding model (such as SBert) to convert text evidence into vector representations and combines the credibility scores of sources to generate the final input vector. 2. **Path - based component**: It uses an existing path - based method (such as COPAAL) to calculate the credibility score of a given statement. 3. **KG - embedding - based component**: Based on pre - trained knowledge - graph - embedding models (such as TransE, ComplEx, etc.), it maps the subject, predicate, and object of a statement into a continuous vector space. 4. **Neural - network component**: After fusing the results of the above three components, it inputs them into a multi - layer perceptron (MLP) module, and finally outputs a credibility score between 0 and 1 through the Sigmoid function. In this way, HybridFC can fully utilize the advantages of different types of methods and overcome the limitations of a single method, thereby achieving more accurate fact verification. ### Summary The core problem of this paper is to improve the ability to predict the truth of statements in knowledge graphs. In particular, in view of the limitations of existing methods, a hybrid method - HybridFC is proposed. This method significantly improves the accuracy of fact verification by integrating text - based, path - based, and embedding - based methods through ensemble learning.