Not All RDF is Created Equal: Investigating RDF Load Times on Resource-Constrained Devices

Piotr Sowinski,Anh Le-Tuan,Pawel Szmeja,Maria Ganzha
2024-08-29
Abstract:As the role of knowledge-based systems in IoT keeps growing, ensuring resource efficiency of RDF stores becomes critical. However, up until now benchmarks of RDF stores were most often conducted with only one dataset, and the differences between the datasets were not explored in detail. In this paper, our objective is to close this research gap by experimentally evaluating the load times of eight diverse RDF datasets from the RiverBench benchmark suite. In the experiments, we use five different RDF store implementations and several resource-constrained hardware platforms. To analyze the results, we introduce the notion of relative loading speed (RLS), allowing us to observe that the loading speed can differ between datasets by as much as a factor of 9.01. This serves as clear evidence that ``not all RDF is created equal'' and stresses the importance of using multiple benchmark datasets in evaluations. We outline the possible reasons for this drastic difference, which should be further investigated in future work. To this end, we published the data, code, and the results of our experiments.
Databases
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore and solve the following problems: 1. **Differences in RDF loading performance on resource - constrained devices**: - As the role of knowledge - based systems in the Internet of Things (IoT) grows, ensuring the resource efficiency of RDF storage systems becomes crucial. However, existing benchmarks usually use only one dataset and do not explore in detail the differences between different datasets. - The paper experimentally evaluates the loading times of eight different RDF datasets on multiple resource - constrained hardware platforms and reveals huge differences (up to 9.01 times) in loading speeds between different datasets. This shows that "not all RDF data are equal" and emphasizes the importance of using multiple benchmark datasets in evaluations. 2. **Limitations of existing benchmarks**: - Most existing RDF storage benchmarks focus on a single dataset or specific application scenarios and lack consideration of the diversity of different datasets. This makes these benchmarks unable to fully reflect the performance of RDF storage systems in various practical applications. - The paper analyzes the performance differences between different datasets by introducing the concept of Relative Loading Speed (RLS) and points out the reasons that should be further explored in future research. 3. **Multi - dimensional evaluation of RDF storage systems**: - The paper not only focuses on loading speed but also evaluates other performance indicators such as RAM usage, CPU time usage, and disk occupancy. These multi - dimensional evaluations provide a more comprehensive reference for future design choices. ### Main contributions of the paper - **Filling research gaps**: By experimentally evaluating the loading performance of multiple RDF datasets on different hardware platforms, it fills the gap in existing research where the performance differences of different datasets are insufficiently explored. - **Introducing new evaluation metrics**: Proposing the concept of Relative Loading Speed (RLS), providing a method to quantify the differences in loading performance of different datasets. - **Releasing public data**: To promote follow - up research, the authors release experimental data, code, and results for the community to use and verify. ### Conclusion The paper experimentally proves the significant differences in loading performance among different RDF datasets and emphasizes the importance of using diverse datasets when evaluating RDF storage systems. At the same time, the paper provides a basis for further research on the impact of different dataset structural characteristics on performance.