Abstract:Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code will be released upon acceptance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of existing Class - Agnostic Counting (CAC) models in multi - category scenarios. Specifically, the existing CAC models have the following main problems: 1. **Reference Oversight (RO)**: Most existing CAC models will ignore the target objects in the reference image when processing the query image, and instead blindly match all the dominant objects in the query image. This leads to the inability of the model to accurately identify and count the target objects in multi - category scenarios. 2. **Insufficiency of evaluation protocols**: Current CAC evaluation protocols and datasets (such as FSC - 147) are mainly used for single - category scenarios and lack the ability to evaluate multi - category scenarios. In addition, commonly used evaluation metrics (such as MAE and RMSE) are easily affected by images with a large number of objects and cannot comprehensively evaluate the performance of the model. 3. **Limitations of training data**: Most existing CAC models are trained on single - category data, which makes them perform poorly in the face of multi - category scenarios because the models lack the ability to distinguish objects of different categories. To solve these problems, the paper proposes the following improvement measures: 1. **Mosaic Augmentation (MA)**: By splicing multiple images of different categories into a composite image (mosaic image), the model is exposed to multi - category data during the training process, thereby improving its generalization ability in multi - category scenarios. 2. **Generalized Loss Function (GL)**: Introduce a generalized loss function based on Unbalanced Optimal Transport (OT) to better capture the location information of the target objects and improve the positioning accuracy of the model. 3. **New evaluation protocols and metrics**: Propose a new multi - category mosaic evaluation dataset and introduce evaluation metrics such as Normalized Absolute Error (NAE) and Squared Relative Error (SRE) to evaluate the performance of the model more fairly and comprehensively. Through these improvements, the paper aims to improve the robustness and accuracy of CAC models in multi - category scenarios and provide a more reasonable evaluation framework to measure the performance of the model. ### Formula display 1. **Generalized Loss Function (GL)**: \[ L_{\tau}^C=\min_P\langle C, P\rangle-\varepsilon H(P)+\tau\|P\mathbf{1}_m - a\|_2^2+\tau\|P^\top\mathbf{1}_n - b\|_1 \] where: - \(C\) is the transport cost between the predicted density map and the real - point annotation. - \(P\) is the corresponding transport plan. - \(H(\cdot)\) is the entropy regularization term. - \(n\) is the number of pixels. - \(m\) is the number of annotation points. - \(a\) is the predicted density map. - \(b\) is the real - point map. 2. **Perspective - Guided Transport Cost**: \[ C_{ij}=\exp\left(\frac{\|x_i - y_j\|^2}{\eta(x_i, y_i)}\right) \] where: - \(\eta(x_i, y_i)\) is the adaptive perspective factor. Through these methods, the paper shows how to significantly improve the performance of CAC models in multi - category scenarios.

A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

Attention-Assisted Feature Comparison and Feature Enhancement for Class-Agnostic Counting

A Simple-but-effective Baseline for Training-free Class-Agnostic Counting

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

MACnet : Mask Augmented Counting Network for Class-Agnostic Counting

SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting

Scale-Aware Network with Regional and Semantic Attentions for Crowd Counting under Cluttered Background

Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting

Enhancing Zero-shot Counting via Language-guided Exemplar Learning

DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction

Representing Domain-Mixing Optical Degradation for Real-World Computational Aberration Correction via Vector Quantization

ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting

CACrowdGAN: Cascaded Attentional Generative Adversarial Network for Crowd Counting

Diffusion-based Data Augmentation for Object Counting Problems

Contrastive Attraction and Contrastive Repulsion for Representation Learning

Crowd Counting Based on Multiscale Spatial Guided Perception Aggregation Network

Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting

Overcoming Statistical Shortcuts for Open-ended Visual Counting

Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks

A Lightweight and Robust Framework for Circulating Genetically Abnormal Cells (CACs) Identification Using 4-Color Fluorescence In Situ Hybridization (FISH) Image and Deep Refined Learning

Progressive Multi-resolution Loss for Crowd Counting