WENN for Individualized Cleaning in Imbalanced Data

Hongjiao Guan,Yingtao Zhang,Min Xian,H. D. Cheng,Xianglong Tang
DOI: https://doi.org/10.1109/icpr.2016.7899676
2016-01-01
Abstract:This paper proposes individualized cleaning for diverse imbalanced data sets. Existing techniques for data cleaning have difficulties with rare cases and outliers in minority class, especially, in highly unbalanced data. The drawback leads incomplete and imprecise examples to removal. In order to enhance the robustness and perform thorough data cleaning, we propose a weighted edited nearest neighbor (WENN), which detects and removes noisy examples from both classes intelligently. It considers individual characteristics of each imbalanced data, involving global class imbalance and local distribution. The main idea of the proposed method is to carefully put more focus on the majority class than the minority class during data cleaning. Extensive experiments over synthetic and real data clearly validate the superiority of our approach against other data cleaning methods.
What problem does this paper attempt to address?