Abstract:Generalized counting has recently emerged to count novel-class objects within a query image, leveraging limited exemplars. Although methods based on exemplar-query pairs matching have made impressive progress, they typically rely on a single correlation representation, regardless of the varying sizes of objects, which limits more accurate counting. In this paper, we introduce a novel and conceptually straightforward perspective to guide the design of our correlation mechanism that enhances the effectiveness of counting size-diversity objects. Our new perspective encompasses three key aspects: (1) Small objects typically exhibit features concentrated within limited spatial regions, underscoring the importance of an effective channel-wise correlation mechanism for small object counting. (2) Large objects tend to possess rich spatial semantics, making an effective spatial-wise correlation mechanism crucial for large object counting. (3) Integrating both channel-wise and spatial-wise correlation mechanisms holds the potential to enhance counting accuracy across different object sizes. Building upon the above perspective, firstly, we propose a simple yet effective Dual-level Channel-wise Correlation (DCC) module that utilizes kernel-wise correlation and distinct correlation to encode global-to-local channel-wise relationships, enhancing small objects counting accuracy. Secondly, we develop a 4D-convolution-based Spatial-aware Correlation (4DSC) module to extract local-to-local spatial correlation in 4D space, promoting large objects counting accuracy. Finally, we combine the proposed DCC and 4DSC to realize our Versatile Correlation Module (VCM) to simultaneously process both small and large objects, providing adaptability to object size diversity. Extensive experiments on the FSC-147 dataset and CARPK dataset demonstrate the effectiveness of the proposed methods and the superior performance of our counting model.

Overcoming Statistical Shortcuts for Open-ended Visual Counting

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

CountGD: Multi-Modal Open-World Counting

Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting

Counting Everyday Objects in Everyday Scenes

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

CountCLIP -- [Re] Teaching CLIP to Count to Ten

Learning to Count via Unbalanced Optimal Transport

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting

Learning to Count without Annotations

CounTR: Transformer-based Generalised Visual Counting

Versatile correlation learning for size-robust generalized counting: A new perspective

DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling

Towards zero-shot object counting via deep spatial prior cross-modality fusion

LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models

Iterative Object Count Optimization for Text-to-image Diffusion Models

TallyQA: Answering Complex Counting Questions

Learning Spatial Similarity Distribution for Few-shot Object Counting

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

Zero-Shot Object Counting with Language-Vision Models