Efficient RDF Graph Storage based on Reinforcement Learning

Lei Zheng,Ziming Shen,Hongzhi Wang
DOI: https://doi.org/10.48550/arXiv.2010.11538
2020-10-22
Abstract:Knowledge graph is an important cornerstone of artificial intelligence. The construction and release of large-scale knowledge graphs in various fields pose new challenges to knowledge graph data management. Due to the maturity and stability, relational database is also suitable for RDF data storage. However, the complex structure of RDF graph brings challenges to storage structure design for RDF graph in the relational database. To address the difficult problem, this paper adopts reinforcement learning (RL) to optimize the storage partition method of RDF graph based on the relational database. We transform the graph storage into a Markov decision process, and develop the reinforcement learning algorithm for graph storage design. For effective RL-based storage design, we propose the data feature extraction method of RDF tables and the query rewriting priority policy during model training. The extensive experimental results demonstrate that our approach outperforms existing RDF storage design methods.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered when storing large - scale RDF graph data in a relational database. Specifically, due to the complex data structure of RDF graph data, traditional data storage methods are difficult to manage these data efficiently. The paper proposes to use Reinforcement Learning (RL) to optimize the storage partitioning method of RDF graph data in a relational database. By transforming the graph storage problem into a Markov Decision Process (MDP), a graph - storage - design algorithm based on Reinforcement Learning is developed. To achieve an effective RL storage design, the paper also proposes a data feature extraction method for RDF tables and a query - rewrite - priority strategy during model training. ### Main contributions of the paper: 1. **Applying Reinforcement Learning to solve the RDF graph - storage problem for the first time**: As far as the authors know, this is the first work to use Reinforcement Learning to solve the graph - storage problem. 2. **Designing an effective Reinforcement Learning model**: A Reinforcement Learning model is designed for the graph - storage problem, which can automatically optimize the storage scheme. 3. **Experimental verification**: The performance of the proposed method is verified through extensive experiments. The experimental results show that this method is superior to existing RDF storage - design methods, such as Apache Jena, in terms of time performance. ### Main technical details: - **State**: The state represents the partitioning situation of tables in the current database. In the initial state, there is only one table to store all data. As the algorithm progresses, tables will be split or merged to form new states. - **Action**: Actions include splitting tables and merging tables. Splitting a table is to select records from one table according to predicates and store them in a new table; merging tables is to merge the records in two tables into a new table according to the join conditions. - **Reward**: The reward is the feedback information from the environment after an action is executed and is used to evaluate the effect of the action. In the graph - storage problem, the reward is calculated according to the time change of workload queries executed in the current table state. The shorter the query time, the higher the reward. - **Predicate - vector mapping**: In order to map the features of large - scale graph data into fixed - length vectors, the paper proposes a predicate - vector - mapping method. By mapping predicates to integers and adding spacer - bit encodings between different tables, a fixed - length vector is finally generated as the input of the neural network. - **Double Deep Q - Network (DDQN)**: DDQN is used to solve the over - estimation problem. By separating the action selection of the target Q - value and the calculation of the target Q - value, the stability and accuracy of the model are improved. - **Query - rewrite strategy**: In order to adapt to the changes in the storage structure, the paper proposes a priority - based query - rewrite strategy to ensure the generation of the optimal query - execution plan under different storage structures. ### Experimental results: The paper verifies the effectiveness of the proposed method through extensive experiments. The experimental results show that this method is significantly superior to existing RDF storage - design methods, such as Apache Jena, in terms of time performance. ### Conclusion: The paper proposes an RDF - graph - data - storage - optimization method based on Reinforcement Learning. By transforming the graph - storage problem into a Markov Decision Process and designing an effective Reinforcement Learning model and a query - rewrite strategy, the automation of storage - scheme optimization is achieved. The experimental results verify the effectiveness and superiority of this method.