Path Pattern Query Processing on Large Graphs

Yiyuan Bai,Chaokun Wang,Xiang Ying
DOI: https://doi.org/10.1109/bdcloud.2014.101
2016-01-01
World Wide Web
Abstract:There are plentiful and diverse applications of graph data management and mining in the real-world scientific research and business activities. As one of the most basic operations, uniform path pattern query processing on graph data faces three big challenges. In this paper, we deal with these challenges by the following points. Firstly, a new query language on graph, called G-Path, is presented, which focuses on complex path pattern query processing on a very large graph. Also, the design of a system called HDGL is proposed, which is based on a BSP-like model as well as MapReduce model, and can effectively handle distributed graph data operations and queries. Secondly, the implementation of HDGL on the de facto cloud platform - Hadoop - is brought forward. Based on the concept of distributed state machine, the query processing of a G-Path statement in HDGL is detailed. In addition, as the query optimization of G-Path queries, several tricks are utilized to improve dramatically the performance of query execution. Finally, extensive experiments on several graph data sets are conducted to show the usability of G-Path query language and the effectiveness of HDGL.
What problem does this paper attempt to address?