Data2Neo -- A Tool for Complex Neo4j Data Integration

Julian Minder,Laurence Brandenberger,Luis Salamanca,Frank Schweitzer
2024-06-12
Abstract:This paper introduces Data2Neo, an open-source Python library for converting relational data into knowledge graphs stored in Neo4j databases. With extensive customization options and support for continuous online data integration from various data sources, Data2Neo is designed to be user-friendly, efficient, and scalable to large datasets. The tool significantly lowers the barrier to entry for creating and using knowledge graphs, making this increasingly popular form of data representation accessible to a wider audience. The code is available at <a class="link-external link-https" href="https://github.com/jkminder/data2neo" rel="external noopener nofollow">this https URL</a> .
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to convert relational data into a knowledge graph efficiently, flexibly and easily for use, and store it in the Neo4j database. Specifically, the paper introduces an open - source Python library named Data2Neo, which aims to simplify this process by providing a wide range of customization options and supporting continuous online data integration from various data sources. The design goal of Data2Neo is to lower the threshold for creating and using knowledge graphs, making it more friendly, efficient and scalable for a wider user group. ### Main problem points: 1. **Limitations of relational databases**: Due to their static table structures and fixed sets of columns, relational databases have serious limitations in handling dynamic data and complex queries. 2. **Deficiencies of existing solutions**: - **Direct Import (DI)**: Although it is simple and efficient, when complex custom conversions or real - time data updates are required, using Cypher queries will increase the complexity. - **ETL tools**: Although they can handle general data conversion tasks, they may become complicated in research and non - enterprise environments due to excessive functionality. - **Programming methods**: Although they provide complete control and flexibility, the development and maintenance costs are high, especially when dynamic expansion or parallel processing is required. 3. **Challenges of data integration**: Integrating existing relational data into a knowledge graph may not be intuitive, especially when complex conversion pipelines such as data cleaning and real - time updates are required. ### Solutions: Data2Neo solves the above problems in the following ways: - **Abstract conversion recipes**: Users can define how to convert relational data into a knowledge graph through a YAML - like conversion recipe. - **Custom pipeline steps**: It supports users to easily integrate custom pre - processing and post - processing functions, which can contain any Python code to achieve operations such as data filtering, cleaning, and enrichment. - **Multi - source data conversion and stream processing**: It can convert and stream - process data from any data source. - **Optimized parallel processing**: It automatically parallelizes data processing and knowledge graph generation to improve processing efficiency. In summary, Data2Neo aims to simplify the process of converting relational data into a knowledge graph by providing a user - friendly, flexible and efficient tool, thereby lowering the threshold for data integration.