Efficient Frequent Pattern Mining in Relational Databases.

Xuequn Shang,Kai-Uwe Sattler,Ingolf Geist
2004-01-01
Abstract:Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Propad (PROjection PAttern Discovery). Propad fundamentally differs from an Apriorilike candidate set generation-and-test approach. This approach successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have made performance evaluation on DBMS (IBM DB2 UDB EEE V8) and compared the performance results with K-Way join approach proposed in [Sarawagi et al., 1998] and SQL based FP-tree approach proposed in [Shang et al., 2004]. The experimental results show that our algorithm can get efficient performance.
What problem does this paper attempt to address?