Does Confusion Really Hurt Novel Class Discovery?
Haoang Chi,Wenjing Yang,Feng Liu,Long Lan,Tao Qin,Bo Han,Chi, Haoang,Qin, Tao
DOI: https://doi.org/10.1007/s11263-024-02012-y
IF: 13.369
2024-03-08
International Journal of Computer Vision
Abstract:When sampling data of specific classes (i.e., known classes) for a scientific task, collectors may encounter unknown classes (i.e., novel classes). Since these novel classes might be valuable for future research, collectors will also sample them and assign them to several clusters with the help of known-class data. This assigning process is known as novel class discovery (NCD). However, category confusion is common in the sampling process and may make the NCD unreliable. To tackle this problem, this paper introduces a new and more realistic setting, where collectors may misidentify known classes and even confuse known classes with novel classes—we name it NCD under unreliable sampling (NUSA). We find that NUSA will empirically degrade existing NCD methods if taking no care of sampling errors. To handle NUSA, we propose an effective solution, named hidden-prototype-based discovery network (HPDN): (1) we try to obtain relatively clean data representations even with the confusedly sampled data; (2) we propose a mini-batch K-means variant for robust clustering, alleviating the negative impact of residual errors embedded in the representations by detaching the noisy supervision timely. Experiments demonstrate that, under NUSA, HPDN significantly outperforms competitive baselines (e.g., more than the best baseline on CIFAR-10) and remains robust when encountering serious sampling errors.
computer science, artificial intelligence