Using MapReduce Platform to Achieve Efficient Parallel Mining of Frequent Subgraphs

Heli SUN,Qiang CHEN,Wei LIU,Jianbin HUANG,Jianhua ZOU
DOI: https://doi.org/10.3778/j.issn.1673-9418.1403027
2014-01-01
Abstract:Frequent subgraph mining is an important problem in data mining domain and has been used widely. This paper proposes an efficient algorithm Cloud-GFSG (cloud-global frequent subgraph), by using MapReduce on Hadoop platform for mining frequent subgraphs. The algorithm is based on the principle of Apriori. It uses the discovered frequent subgraphs whose support is k-1 to generate the candidate frequent subgraphs whose support is k when it gener-ates new subgraphs by extending edge. Meanwhile, it checks whether there exists any subgraph which would be gener-ated and sets the frequent subgraph generation rules to ensure the uniqueness of the frequent subgraphs. Compared with the state-of-the-art algorithms, the proposed algorithm has more general function and can avoid generating replicate graphs while extending a new edge. Therefore, its correctness can be ensured and the efficiency had been improved significantly.
What problem does this paper attempt to address?