Abstract:Discretization of real value attributes (features) is an important pre-processing task in data mining, particularly for classification problems, and it has received significant attentions in machine learning community (Chmielewski & Grzymala-Busse, 1994; Dougherty et al., 1995; Nguyen & Skowron, 1995; Nguyen, 1998; Liu et al., 2002). Various studies have shown that discretization methods have the potential to reduce the amount of data while retaining or even improving predictive accuracy. Moreover, as reported in a study (Dougherty et al., 1995), discretization makes learning faster. However, most of the typical discretization methods can be considered as univariate discretization methods, which may fail to capture the correlation of attributes and result in degradation of the performance of a classification model. As reported (Liu et al., 2002), numerous discretization methods available in the literatures can be categorized in several dimensions: dynamic vs. static, local vs. global, splitting vs. merging, direct vs. incremental, and supervised vs. unsupervised. A hierarchical framework was also given to categorize the existing methods and pave the way for further development. A lot of work has been done, but still many issues remain unsolved, and new methods are needed (Liu et al. 2002). Since there are lots of discretization methods available, how does one evaluate discretization effects of various methods? In this study, we will focus on simplicity based criteria while preserving consistency, where simplicity is evaluated by the number of cuts. The fewer the number of cuts obtained by a discretization method, the better the effect of that method. Hence, real value attributes discretization can be defined as a problem of searching a global minimal set of cuts on attribute domains while preserving consistency, which has been shown as NP-hard problems (Nguyen, 1998). Rough set theory (Pawlak, 1982) has been considered as an effective mathematical tool for dealing with uncertain, imprecise and incomplete information and has been successfully applied in such fields as knowledge discovery, decision support, pattern classification, etc. However, rough set theory is just suitable to deal with discrete attributes, and it needs discretization as a pre-processing step for dealing with real value attributes. Moreover, attribute reduction is another key problem in rough set theory, and finding a minimal

A Two Phases Unsupervised Sequential Forward Fractal Dimensionality Reduction Algorithm

Unsupervised dimensionality reduction based on fractal dimension and genetic algorithm

The practical method of fractal dimensionality reduction based on z-ordering technique

Unsupervised attribute reduction algorithm framework based on spectral clustering and attribute significance function

Database redundant attribute detection using fractal dimension

On Combining Fractal Dimension with GA for Feature Subset Selecting

Fast Attribute Selection Algorithm Based on Fractal Dimension

Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model

Discriminative Unsupervised Dimensionality Reduction.

Parallel Attribute Reduction Algorithm for Unlabeled Data Based on Fuzzy Discernibility Matrix and Soft Deletion Behavior

Performance Optimization of Fractal Dimension Based Feature Selection Algorithm

Discretization Algorithm for Continuous Attribute in Rough Set Theory Based on Heuristic SOM

Integration Method of Ant Colony Algorithm and Rough Set Theory for Simultaneous Real Value Attribute Discretization and Attribute Reduction

New Heuristic Attribute Reduction Algorithm Based on Rough Set

Structure Preserved Fast Dimensionality Reduction

Unsupervised Attribute Reduction: Improving Effectiveness and Efficiency

Research and Comparison of Data Dimensionality Reduction Algorithms

Fast Unsupervised Dimension Reduction Method Based on Maximum Entropy

Unsupervised 2D dimensionality reduction by jointly learning structural and temporal correlation

A Perception-Driven Approach to Supervised Dimensionality Reduction for Visualization

Fast and Robust Attribute Reduction Based on the Separability in Fuzzy Decision Systems