Abstract:The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.

What problem does this paper attempt to address?

The paper aims to address the challenges of ensuring the completeness and timeliness of knowledge graphs during their construction. Specifically, the paper proposes a scalable framework called ODK (Open Domain Knowledge Extraction), which can efficiently extract high-quality entities and facts from the open web to enhance the content coverage and freshness of knowledge graphs. The paper points out that traditional methods of constructing knowledge graphs rely on manual review, which is both time-consuming and costly, and difficult to scale to large datasets. Therefore, researchers have developed the ODK automated framework to continuously update the facts in knowledge graphs to maintain their completeness and timeliness. The design of ODK takes into account the following key challenges: 1. **Large Data Volume**: The amount of data and facts on the web is enormous and constantly growing, requiring the processing of web-scale data. 2. **Data and Task Diversity**: The web contains various types of data, including plain text, semi-structured data, etc. To extract high-quality facts from these sources, multiple types of extractors are needed. 3. **High Accuracy**: Information on the web may contain errors or conflicting facts, such as different statements about a person's height. Additionally, some facts change over time, so it is necessary to identify the most accurate and up-to-date facts. 4. **Timeliness**: Timely extraction of new knowledge from the web and its incorporation into the knowledge graph is crucial for many downstream applications. To address the above challenges, ODK supports both streaming and batch processing by adopting a series of extraction models, meeting the needs of different latency scenarios. Additionally, it addresses issues such as multilingual support and link inference, and supports streaming processing mode, improving the system's scalability and data freshness. In summary, the goal of this paper is to address the shortcomings in the existing knowledge graph construction process by proposing the ODK framework, particularly in terms of data scale, diversity, accuracy, and timeliness, thereby enhancing the quality and usability of knowledge graphs.

Open Domain Knowledge Extraction for Knowledge Graphs

OpenKG Chain: A Blockchain Infrastructure for Open Knowledge Graphs

CrowdGeoKG: Crowdsourced Geo-Knowledge Graph

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Construction and Applications of Open Business Knowledge Graph

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective

Leveraging Knowledge Graph for Open-Domain Question Answering

OpenConcepts:A Public Available Fine-Grained Chinese Concept Knowledge Graph

Knowledge Graph Anchored Information-Extraction for Domain-Specific Insights

ENT-DESC: Entity Description Generation by Exploring Knowledge Graph

LOKE: Linked Open Knowledge Extraction for Automated Knowledge Graph Construction

ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources

Knowledge enhanced graph inference network based entity-relation extraction and knowledge graph construction for industrial domain

Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning

OpenKE: an Open Toolkit for Knowledge Embedding.

Knowledge Graphs: Opportunities and Challenges

Open Knowledge Enrichment for Long-tail Entities.

Research on Knowledge Extraction Technology for Knowledge Graph Construction

A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks

Dave: Extracting Domain Attributes And Values From Text Corpus

Construction and Application of a Knowledge Graph