Efficient and Reliable Estimation of Knowledge Graph Accuracy

Stefano Marchesin,Gianmaria Silvello
DOI: https://doi.org/10.14778/3665844.3665865
IF: 2.5
2024-05-01
Proceedings of the VLDB Endowment
Abstract:Data accuracy is a central dimension of data quality, especially when dealing with Knowledge Graphs (KGs). Auditing the accuracy of KGs is essential to make informed decisions in entity-oriented services or applications. However, manually evaluating the accuracy of large-scale KGs is prohibitively expensive, and research is focused on developing efficient sampling techniques for estimating KG accuracy. This work addresses the limitations of current KG accuracy estimation methods, which rely on the Wald method to build confidence intervals, addressing reliability issues such as zero-width and overshooting intervals. Our solution, rooted in the Wilson method and tailored for complex sampling designs, overcomes these limitations and ensures applicability across various evaluation scenarios. We show that the presented methods increase the reliability of accuracy estimates by up to two times when compared to the state-of-the-art while preserving or enhancing efficiency. Additionally, this consistency holds regardless of the KG size or topology.
computer science, information systems, theory & methods
What problem does this paper attempt to address?