Frequent itemset mining with parallel RDBMS

Xuequn Shang,Kai-Uwe Sattler
DOI: https://doi.org/10.1007/11430919_63
2005-01-01
Abstract:Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.
What problem does this paper attempt to address?