An optimization approach with weighted SCiForest and weighted Hausdorff distance for noise data and redundant data

Yifeng Zheng,Guohe Li,Ying Li,Wenjie Zhang,Xueling Pan,Yaojin Lin
DOI: https://doi.org/10.1007/s10489-021-02685-9
IF: 5.3
2021-07-30
Applied Intelligence
Abstract:With the development of intelligent technology, data obtained from practical applications may be subject to noise information (outlier data or redundant data). Noise data usually leads to the deterioration of the performance and robustness of classifiers. In order to address the above problem, in this paper, we propose an optimization method for Outlier samples and Redundant samples Detection (ORD). Firstly, we leverage the maximum information compression to eliminate irrelevant feature information. Secondly, an outlier optimization filter is proposed, called WSCiForest, which utilizes the fusion strategy based on the entropy-weighted and group optimization theory to calculate the distribution estimated score for each sample. Eventually, ORD adopts the improved Hausdorff distance to obtain redundant samples effectively. The experimental results show that the proposed method can effectively optimize the data space.
computer science, artificial intelligence
What problem does this paper attempt to address?