An algorithm for mining strongly correlated pairs in relational table

Jianpei Zhang,Qiang Li
DOI: https://doi.org/10.1109/ICMLC.2005.1527206
2005-01-01
Abstract:Given a user-specified minimum correlation threshold and a relational table, the problem of mining all-strong correlated pairs is to find all attribute value pairs with Pearson's correlation coefficients above the minimum correlation threshold. However, algorithms developed for transaction database will generate invalid candidate pairs due to fundamental property of the itemsets in relational table (i.e. 1NF, they cannot contain more that one item per table column) and hence encounter additional and unnecessary computation cost. In this paper, using this property, the join step in the candidate generation phase is adapted to reflect this and to prune candidate set by not taking into itemsets which are not in 1NF. Furthermore, we propose other techniques to reduce the number of candidate pairs that are to be examined in the refinement step, even when the upper bound based pruning technique is useless in case of very low correlation threshold. Experimental results from real data sets exhibit that our algorithm can produce smaller candidate set and be faster than previous algorithms.
What problem does this paper attempt to address?