Progressive Cluster Purification for Unsupervised Feature Learning

Yifei Zhang,Chang Liu,Yu Zhou,Wei Wang,Weiping Wang,Qixiang Ye
DOI: https://doi.org/10.48550/arXiv.2007.02577
2020-07-16
Abstract:In unsupervised feature learning, sample specificity based methods ignore the inter-class information, which deteriorates the discriminative capability of representation models. Clustering based methods are error-prone to explore the complete class boundary information due to the inevitable class inconsistent samples in each cluster. In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training, while the sizes of clusters continuously expand consistently with the growth of model representation capability. With a well-designed cluster purification mechanism, it further purifies clusters by filtering noise samples which facilitate the subsequent feature learning by utilizing the refined clusters as pseudo-labels. Experiments on commonly used benchmarks demonstrate that the proposed PCP improves baseline method with significant margins. Our code will be available at <a class="link-external link-https" href="https://github.com/zhangyifei0115/PCP" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the discriminative ability of feature representations in unsupervised feature learning. Specifically, the existing sample - specificity - based methods ignore the inter - class information, which reduces the discriminative ability of the representation model; while the clustering - based methods are easily affected by inevitable class - inconsistent samples when exploring the complete class boundary information, leading to errors. For this reason, the authors propose a new clustering - based method - Progressive Cluster Purification (PCP), which mitigates the influence of noisy samples in a simple and effective way during the cluster formation process by iteratively excluding class - inconsistent samples. PCP gradually reduces the number of clusters in the training process while continuously expanding the size of the clusters, which is consistent with the growth of the model's representational ability. In addition, PCP also designs a cluster purification mechanism to further purify the clusters by filtering noisy samples, and uses the refined clusters as pseudo - labels to promote subsequent feature learning. Experimental results show that the proposed PCP method significantly outperforms the baseline methods on common benchmark tests.