Procedure Model for Building Knowledge Graphs for Industry Applications

Sascha Meckler
2024-09-20
Abstract:Enterprise knowledge graphs combine business data and organizational knowledge by means of a semantic network of concepts, properties, individuals and relationships. The graph-based integration of previously unconnected information with domain knowledge provides new insights and enables intelligent business applications. However, knowledge graph construction is a large investment which requires a joint effort of domain and technical experts. This paper presents a practical step-by-step procedure model for building an RDF knowledge graph that interconnects heterogeneous data and expert knowledge for an industry use case. The self-contained process adapts the "Cross Industry Standard Process for Data Mining" and uses competency questions throughout the entire development cycle. The procedure model starts with business and data understanding, describes tasks for ontology modeling and the graph setup, and ends with process steps for evaluation and deployment.
Information Retrieval
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the challenges and requirements encountered in the process of building industry application knowledge graphs (Knowledge Graphs, KG). Specifically, the paper proposes a practical procedure model for building knowledge graphs for industry applications (Procedure Model for Building Knowledge Graphs for Industry Applications, KG-PM) to tackle the following issues: 1. **Integration of Data Silos**: In enterprise environments, data is often scattered across multiple heterogeneous data sources, which are usually isolated and lack effective connection and integration. Knowledge graphs can integrate information from these data sources through semantic networks, thereby providing new insights. 2. **Combination of Domain Knowledge and Technical Knowledge**: Building knowledge graphs requires the joint efforts of domain experts and technical experts. However, collaboration between these two groups often faces difficulties, necessitating a systematic approach to ensure effective communication and cooperation between both parties. 3. **High Cost and Complexity of Knowledge Graph Construction**: Constructing knowledge graphs is a large-scale investment that requires significant time and resources. Existing general knowledge graph construction methods cannot meet the specific needs of particular industries, and processes tailored to specific domains are difficult to transfer across different fields. 4. **Continuous Improvement and Evolution of Knowledge Graphs**: Knowledge graphs need to be continuously updated and optimized over time to adapt to changing business needs. Existing methods lack a systematic, iterative process model to support this requirement. By proposing a step-by-step model based on the "Cross-Industry Standard Process for Data Mining" (CRISP-DM), the paper aims to provide a guideline for building knowledge graphs suitable for industry applications, covering the entire lifecycle from business understanding, data understanding, data preparation, modeling, evaluation to deployment. This model particularly emphasizes determining the scope of the knowledge graph and evaluating its concepts through Competency Questions (CQ) to ensure that the constructed knowledge graph meets actual business needs.