Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery

Haiyang Zheng,Nan Pu,Wenjing Li,Nicu Sebe,Zhun Zhong
2024-10-25
Abstract:In this paper, we study a practical yet challenging task, On-the-fly Category Discovery (OCD), aiming to online discover the newly-coming stream data that belong to both known and unknown classes, by leveraging only known category knowledge contained in labeled data. Previous OCD methods employ the hash-based technique to represent old/new categories by hash codes for instance-wise inference. However, directly mapping features into low-dimensional hash space not only inevitably damages the ability to distinguish classes and but also causes "high sensitivity" issue, especially for fine-grained classes, leading to inferior performance. To address these issues, we propose a novel Prototypical Hash Encoding (PHE) framework consisting of Category-aware Prototype Generation (CPG) and Discriminative Category Encoding (DCE) to mitigate the sensitivity of hash code while preserving rich discriminative information contained in high-dimension feature space, in a two-stage projection fashion. CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes. DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes and the constraint of minimum separation distance. By jointly optimizing CPG and DCE, we demonstrate that these two components are mutually beneficial towards an effective OCD. Extensive experiments show the significant superiority of our PHE over previous methods, e.g., obtaining an improvement of +5.3% in ALL ACC averaged on all datasets. Moreover, due to the nature of the interpretable prototypes, we visually analyze the underlying mechanism of how PHE helps group certain samples into either known or unknown categories. Code is available at <a class="link-external link-https" href="https://github.com/HaiyangZheng/PHE" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively identify samples belonging to known and unknown categories in the incoming data stream in the On - the - fly Fine - Grained Category Discovery (OCD) task. Specifically, the paper aims to address the following two main challenges: 1. **Requirement for real - time feedback**: The OCD task requires the system to provide immediate feedback for each newly arriving instance, while traditional offline clustering methods cannot meet this requirement. 2. **Uncertainty in the open - world scenario**: Since the number of categories in the real world is uncertain, existing classifier - based methods perform poorly in the OCD task because they assume that the number of categories is known or can be pre - discovered. To solve these problems, the authors point out that existing methods such as SMILE directly map image features to a low - dimensional hash space. Although they can generate category descriptors, they have the problem of "high sensitivity", especially when it is particularly obvious in fine - grained categories. This high sensitivity will lead to inaccuracies in category descriptors, thus affecting classification performance. Therefore, the authors propose a new framework - Prototypical Hash Encoding (PHE) - to improve the intra - class compactness and inter - class separation of category descriptors while reducing the information loss caused by dimensionality reduction. The PHE framework consists of two parts: - **Category - aware Prototype Generation (CPG)**: Capture the intra - class diversity by learning multiple prototypes for each category and generate category - specific prototypes. - **Discriminative Category Encoding (DCE)**: Explicitly map the generated prototypes to the corresponding category hash centers and enhance the discriminative ability by minimizing the distance between hash centers. In addition, the authors also design a center - separation loss function to ensure that the hash centers of different categories maintain at least one Hamming distance \(d_{\text{max}}\), where \(d_{\text{max}}\) is derived from the Gilbert - Varshamov bound. Through these improvements, the PHE framework can effectively alleviate the "high sensitivity" problem in hash encoding while maintaining the discriminative information in the high - dimensional feature space, thereby improving the classification accuracy for known and unknown categories. Experimental results show that PHE significantly outperforms existing methods on multiple fine - grained datasets, with an average improvement of 5.3% in overall category accuracy. ### Formula Summary - **Similarity Score Calculation**: \[ s_{i \to j} = g_{p_j}(z_i)=\log \left( \frac{1}{\|z_i - p_j\|_2^2+\epsilon} \right) \] where \(\epsilon\) is a small constant used for numerical stability. - **Loss Function of the Prototype Generation Module**: \[ L_p=\frac{1}{|B|} \sum_{i \in B} \ell(y_i,\text{FC}(B(\theta) \cdot s_i)) \] where \(B\) represents a mini - batch of the support set, \(\ell\) is the traditional cross - entropy loss, and \(y_i\) is the true label of image \(x_i\). - **Loss Function of the Hash Encoding Module**: \[ L_f=\frac{1}{|B|} \sum_{i \in B} \ell(y_i,\text{sim}(b_i, h)) \] where \(\text{sim}(b_i, h)\) represents the cosine similarity vector between the hash feature of image \(x_i\) and all hash centers.