Abstract:Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. We hope the investigation and proposed simple framework can serve as a strong baseline to facilitate future studies in this field. Our code is available at: <a class="link-external link-https" href="https://github.com/CVMI-Lab/SimGCD" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper "Parametric Classification for Generalized Category Discovery: A Baseline Study" aims to address the key issues in Generalized Category Discovery (GCD). Specifically, the goal of GCD is to discover new categories in an unlabeled dataset while leveraging the knowledge from a labeled dataset for classification. ### Background and Motivation 1. **Generalized Category Discovery (GCD)**: - The goal of GCD is to discover new categories in an unlabeled dataset and correctly classify these new categories while maintaining classification performance on known categories. - Previous research has shown that parametric classifiers tend to overfit to known categories when dealing with new categories, thus favoring non-parametric classifiers (e.g., semi-supervised k-means based methods). 2. **Limitations of Existing Methods**: - Parametric classifiers perform poorly when handling new categories, mainly due to the unreliability of pseudo-labels and prediction bias. - Although non-parametric classifiers perform well in some cases, they are computationally expensive and cannot jointly optimize the hyperplanes for all categories. ### Research Objectives 1. **Re-examine the Reasons for the Failure of Parametric Classifiers**: - Through a series of experiments, the authors validate the effectiveness of feature representation under high-quality supervision and training paradigm design, pointing out that the unreliability of pseudo-labels is a key factor leading to the performance degradation of parametric classifiers. 2. **Propose a Simple and Effective Parametric Classification Method**: - The authors propose a parametric classification method based on entropy regularization, which performs well in multiple GCD benchmarks and is robust to the number of unknown categories. ### Main Contributions 1. **Re-evaluate the Design Choices of Parametric Classifiers**: - Through experimental analysis, the authors identify the key factors leading to the failure of parametric classifiers in GCD tasks. 2. **Propose a Simple and Effective Parametric Classification Method**: - This method combines entropy regularization and self-distillation techniques to generate more balanced pseudo-labels, achieving significant performance improvements in multiple GCD benchmarks. 3. **Challenge the Mainstream View of Non-Parametric Classifiers**: - The experimental results show that parametric classifiers, with appropriate design, can achieve or even surpass the performance of non-parametric classifiers. ### Experimental Results - **Performance on Multiple Benchmark Datasets**: - On fine-grained datasets (e.g., CUB, Stanford Cars, FGVC-Aircraft) and general image recognition datasets (e.g., CIFAR100, ImageNet-100), the proposed method significantly outperforms existing SOTA methods in recognizing new categories, with a performance improvement of about 10%. - On more challenging datasets (e.g., Herbarium 19 and ImageNet-1K), the method also shows consistent performance improvements. ### Conclusion By re-evaluating the design choices of parametric classifiers and proposing a simple and effective parametric classification method, the authors not only address the key issues in GCD tasks but also provide a strong baseline for future research.

Parametric Classification for Generalized Category Discovery: A Baseline Study

Prototypical Classifier with Distribution Consistency Regularization for Generalized Category Discovery: A Strong Baseline

Let’s Start Over: Retraining with Selective Samples for Generalized Category Discovery

Prediction Consistency Regularization for Generalized Category Discovery

Parametric Information Maximization for Generalized Category Discovery

ImbaGCD: Imbalanced Generalized Category Discovery

Generalized Categories Discovery for Long-tailed Recognition

A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

Semisupervised Prior Free Rare Category Detection with Mixed Criteria

Happy: A Debiased Learning Framework for Continual Generalized Category Discovery

Active Generalized Category Discovery

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Unleashing the Potential of Model Bias for Generalized Category Discovery

Learning to Distinguish Samples for Generalized Category Discovery

Multimodal Generalized Category Discovery

CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery

Pseudo-supervised contrastive learning with inter-class separability for generalized category discovery

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

Contextuality Helps Representation Learning for Generalized Category Discovery

Generalized Category Discovery with Clustering Assignment Consistency