Label Noise Filtering Based on Data Distribution and Relative Density

Hang Liu,Shuai Bian,Chi Liu,Yong Qiu,Daizhi Lang,Zhiwen Liu,Jianjun Lei
DOI: https://doi.org/10.1109/PIC50277.2020.9350818
2020-01-01
Abstract:Label noise is a non-negligible problem in supervised and semi-supervised machine learning. Current methods to deal with label noise are mainly algorithm-level robust modeling and data-level noise filtering. However, these methods suffer from poor or inefficient filtration. In this paper, a label noise filtering method based on data distribution and relative density (DD-RDF) is proposed. Firstly, according to the sample distribution of the data set, the regions where each sample is located and divided. Secondly, different filtering rules are adopted for noise removal in different areas. Compared with the Data Distribution Filtering (DDF) algorithm, the DD-RDF algorithm adopts relative density filtering rules to increase neighborhood information in the low-density mixed label area, which makes the algorithm more robust. Experimental results on 12 UCI standard multi-classification data sets show that the noise filtering effect of the DD-RDF algorithm is better than that of the comparison algorithms under different noise ratios.
What problem does this paper attempt to address?