An ICA-Based multivariate discretization algorithm

Ye Kang,Shanshan Wang,Xiaoyan Liu,Hokyin Lai,Huaiqing Wang,Baiqi Miao
DOI: https://doi.org/10.1007/11811220_47
2006-01-01
Abstract:Discretization is an important preprocessing technique in data mining tasks. Univariate Discretization is the most commonly used method. It discretizes only one single attribute of a dataset at a time, without considering the interaction information with other attributes. Since it is multi-attribute rather than one single attribute determines the targeted class attribute, the result of Univariate Discretization is not optimal. In this paper, a new Multivariate Discretization algorithm is proposed. It uses ICA (Independent Component Analysis) to transform the original attributes into an independent attribute space, and then apply Univariate Discretization to each attribute in the new space. Data mining tasks can be conducted in the new discretized dataset with independent attributes. The numerical experiment results show that our method improves the discretization performance, especially for the nongaussian datasets, and it is competent compared to PCA-based multivariate method.
What problem does this paper attempt to address?