Ordered Equivalence Class Partition On Hash
Jiang Yunliang,Fan Jing,Liu Yong
2010-01-01
Abstract:Generally speaking, performance analysis of the algorithm is only relevant to the size of data set. But in practice, if not considering the data distribution, it affect the adaptability analysis on algorithm. Rough set theory, proposed by Pawlak in the early 1980s, is a mathematical tool in computer application. It has been successfully applied in machine learning, pattern recognition, expert systems, data analysis, and so on. In rough set theory, data are presented as a table. After ordering the two-dimensional table, it is easy to obtain the equivalence class. The most important reason is that the attributes and the value range in table determine the equivalence class, and two concepts saturation and concentration can describe the distribution of attribute value. In this paper, we propose a new concept of ordered equivalence class which is an extension of equivalence class. In fact, ordering the two-dimensional table is the main step to get ordered equivalence class. Usually the time complexity of it is O(vertical bar U vertical bar vertical bar R vertical bar log vertical bar U vertical bar,) where vertical bar U vertical bar is the number of entries and vertical bar R vertical bar is the number of attributes. Paper [1] make a profound analysis reduce the time complexity to O(vertical bar U vertical bar x(vertical bar R vertical bar + log vertical bar U vertical bar)) In this paper, we introduce a new algorithm based on hash with the time complexity O(vertical bar U vertical bar vertical bar R vertical bar) to obtain the ordered equivalence class. Drawing a comparison, especially on the condition that the saturation and concentration are both high, the hash algorithm is more efficient than in paper[1]