Goldfish: A Large Scale Semantic Data Store and Query System Based on Boolean Matrix Factorization

Rong GU,Hong-Jian QIU,Wen-Jia YANG,Wei HU,Chun-Feng YUAN,Yi-Hua HUANG
DOI: https://doi.org/10.11897/SP.J.1016.2017.02212
2017-01-01
Chinese Journal of Computers
Abstract:With the rapid development of the Internet applications and the semantic web technology,the amount of the semantic data is exploding.On one hand,it is significant to store and query semantic data efficiently,as many applications can provide better services based on this.On the other hand,the rapid increase of the semantic data brings new challenges on efficient storing and querying semantic data in big data era.The traditional ways for semantic data management is to store and query the data in relational database management systems.As the data increases,the traditional ways can hardly handle big data.To address this problem,this paper proposed a distributed hierarchical storage architecture to store and query large-scale semantic data based on the OpenRDF Sesame framework.The RDF storage mechanism is optimized by adopting the attribute table to replace the RDF triple store.Considering the big semantic data,a parallel frequent item set mining algorithm with Spark framework is proposed to generate the index of the attribute table.Moreover,a layer of optimized hash conversion is proposed to avoid wasting time in frequent hash table search during query stage.To evaluate the efficiency of the proposed approach in this paper,we implement a prototype system called Goldfish,and conduct a comparison use large-scale synthetic dataset and real dataset.Experiment results show that Goldfish is around 8 times faster than Rainbow,500 times faster than Jena-HBase and 1200 times faster than the MapReduce based RDF querying system SHARD.
What problem does this paper attempt to address?