A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

Tsung-Han Chou,Brian Wang,Wei-Chen Chiu,Jun-Cheng Chen
2024-04-15
Abstract:Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code will be released upon acceptance.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of existing Class - Agnostic Counting (CAC) models in multi - category scenarios. Specifically, the existing CAC models have the following main problems: 1. **Reference Oversight (RO)**: Most existing CAC models will ignore the target objects in the reference image when processing the query image, and instead blindly match all the dominant objects in the query image. This leads to the inability of the model to accurately identify and count the target objects in multi - category scenarios. 2. **Insufficiency of evaluation protocols**: Current CAC evaluation protocols and datasets (such as FSC - 147) are mainly used for single - category scenarios and lack the ability to evaluate multi - category scenarios. In addition, commonly used evaluation metrics (such as MAE and RMSE) are easily affected by images with a large number of objects and cannot comprehensively evaluate the performance of the model. 3. **Limitations of training data**: Most existing CAC models are trained on single - category data, which makes them perform poorly in the face of multi - category scenarios because the models lack the ability to distinguish objects of different categories. To solve these problems, the paper proposes the following improvement measures: 1. **Mosaic Augmentation (MA)**: By splicing multiple images of different categories into a composite image (mosaic image), the model is exposed to multi - category data during the training process, thereby improving its generalization ability in multi - category scenarios. 2. **Generalized Loss Function (GL)**: Introduce a generalized loss function based on Unbalanced Optimal Transport (OT) to better capture the location information of the target objects and improve the positioning accuracy of the model. 3. **New evaluation protocols and metrics**: Propose a new multi - category mosaic evaluation dataset and introduce evaluation metrics such as Normalized Absolute Error (NAE) and Squared Relative Error (SRE) to evaluate the performance of the model more fairly and comprehensively. Through these improvements, the paper aims to improve the robustness and accuracy of CAC models in multi - category scenarios and provide a more reasonable evaluation framework to measure the performance of the model. ### Formula display 1. **Generalized Loss Function (GL)**: \[ L_{\tau}^C=\min_P\langle C, P\rangle-\varepsilon H(P)+\tau\|P\mathbf{1}_m - a\|_2^2+\tau\|P^\top\mathbf{1}_n - b\|_1 \] where: - \(C\) is the transport cost between the predicted density map and the real - point annotation. - \(P\) is the corresponding transport plan. - \(H(\cdot)\) is the entropy regularization term. - \(n\) is the number of pixels. - \(m\) is the number of annotation points. - \(a\) is the predicted density map. - \(b\) is the real - point map. 2. **Perspective - Guided Transport Cost**: \[ C_{ij}=\exp\left(\frac{\|x_i - y_j\|^2}{\eta(x_i, y_i)}\right) \] where: - \(\eta(x_i, y_i)\) is the adaptive perspective factor. Through these methods, the paper shows how to significantly improve the performance of CAC models in multi - category scenarios.