A Self-Training Method Based on Density Peaks and an Extended Parameter-Free Local Noise Filter for K Nearest Neighbor

Junnan Li,Qingsheng Zhu,Quanwang Wu
DOI: https://doi.org/10.1016/j.knosys.2019.104895
IF: 8.139
2019-01-01
Knowledge-Based Systems
Abstract:Self-training method is one of the relatively successful methodologies of semi-supervised classification. It can exploit both labeled data and unlabeled data to train a satisfactory supervised classifier. Mislabeling is one of the largest challenges in the self-training method and the most common technique for removing mislabeled samples is the local noise filter. However, existing local noise filters used in self-training methods confront following technical defects: parameter dependence and using only labeled data to remove mislabeled samples. To address these shortcomings, this paper proposes a novel self-training method based on density peaks and an extended parameter-free local noise filter (STDPNF). In STDPNF, the self-training method based on density peaks is redesigned to be more suitable for combination with local noise filters. Moreover, a new local noise filter based on natural neighbors is proposed to filter out mislabeled instances. Compared with existing local noise filters used in self-training methods, the one in STDPNF is parameter-free and can remove mislabeled samples by exploiting the information of both labeled data and unlabeled data. We focus on k nearest neighbor as a base classifier. In experiments, we verify the efficiency of STDPNF in improving the performance of the base classifier of k nearest neighbor and the advantage of STDPNF in having the ability to remove mislabeled instances efficiently even when labeled data are not sufficient.
What problem does this paper attempt to address?