Abstract:In this paper, we introduce the idea of blockwise classification to count objects. The current mainstream method for counting objects is to regress the density map or to regress the redundant count map via a deep convolutional neural network (CNN). However, these methods suffer from two critical issues: inaccurately generated regression targets and serious sample imbalances. First, the ground truth density map is generated by convolving the dot map using a Gaussian kernel. Because an inappropriate kernel can cover the background or uncover objects, this approach introduces a form of noise, and therefore results in ambiguities when training the networks. Second, inhomogeneously distributed objects often exist in images, which gives rise to a data collection bias. This leads to a long-tailed distribution of region counts, which is a typical characteristic that occurs with imbalanced samples; therefore, underestimations in high-density regions and overestimations in low-density regions are common. In this paper, we address these two issues within one framework—blockwise count level classification. The intuition behind this idea is that while it may not be possible to provide an exact count of pixels or patches, it is possible to provide a count of a region that falls within a certain interval with high confidence. Our method classifies the count levels of each block produced by nonlinearly quantizing the continuous counts, thus transforming the imbalance of sample patch counts into a class imbalance of count levels. Consequently, an information-entropy-inspired loss can be applied to alleviate this issue. Through ablative studies, we analyze the impact of imbalanced data, Gaussian kernel sizes, quantization errors, and the effectiveness of each module in our method. Without bells and whistles, our method outperforms or performs competitively with other state-of-the-art approaches on seven object-counting benchmarks, including four crowd--ounting datasets from ShanghaiTech, WorldExpo'10, UCF-QNRF and UCF_CC_50, one vehicle-counting dataset (TRANCOS), one maize-tassel-counting dataset (MTC), and one challenging sonar fish-counting dataset that we constructed. The results suggest that our framework provides a strong and improved baseline for object counting.

Open-world Text-specified Object Counting

CountGD: Multi-Modal Open-World Counting

CounTR: Transformer-based Generalised Visual Counting

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Zero-Shot Object Counting with Language-Vision Models

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

TFCounter:Polishing Gems for Training-Free Object Counting

AFreeCA: Annotation-Free Counting for All

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

SATCount: A scale-aware transformer-based class-agnostic counting framework

Counting Everyday Objects in Everyday Scenes

Overcoming Statistical Shortcuts for Open-ended Visual Counting

Training-free Object Counting with Prompts

KTCN: Enhancing Open-World Object Detection with Knowledge Transfer and Class-Awareness Neutralization

ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting

Iterative Object Count Optimization for Text-to-image Diffusion Models

Learning to Count without Annotations

A Practical Method for Counting Arbitrary Target Objects in Arbitrary Scenes

Counting Objects by Blockwise Classification

Zero-shot Object Counting with Good Exemplars

Towards zero-shot object counting via deep spatial prior cross-modality fusion