Parallel Mining Algorithm of Association Rules on Massive Data

张兆功,李建中,张艳秋
DOI: https://doi.org/10.3321/j.issn:0367-6234.2004.05.001
2004-01-01
Abstract:Discusses technology of mining association rules on massive data, emphases study efficient parallel algorithm. Mining association rules is an essential research aspect. At present, many association rules mining algorithms have been discussed. As the database scale is big and executed time of algorithms is very long. Parallel computing is an efficient method for this problem; however, existed parallel algorithms all have shortcomings of less scalability and weak dealing for massive data. Ignoring local frequent itemsets in which only (1/4) of all node computers emerge, we propose a new parallel sampling algorithm. It takes advantage of the ability of cluster computer with self-manage and parallel accessible for disks I/O, to improve the ability and efficiency of sampling algorithm for dealing with massive data. Theoretic analysis and experiential result show that speedup of the algorithm approximates node number p of cluster computer, and the complex of communication is the logarithm of node number p, holds better scalibility and ability to deal with massive data, and has also high-precision.
What problem does this paper attempt to address?