Dense Center-Direction Regression for Object Counting and Localization with Point Supervision

Domen Tabernik,Jon Muhovič,Danijel Skočaj
DOI: https://doi.org/10.1016/j.patcog.2024.110540
2024-08-27
Abstract:Object counting and localization problems are commonly addressed with point supervised learning, which allows the use of less labor-intensive point annotations. However, learning based on point annotations poses challenges due to the high imbalance between the sets of annotated and unannotated pixels, which is often treated with Gaussian smoothing of point annotations and focal loss. However, these approaches still focus on the pixels in the immediate vicinity of the point annotations and exploit the rest of the data only indirectly. In this work, we propose a novel approach termed CeDiRNet for point-supervised learning that uses a dense regression of directions pointing towards the nearest object centers, i.e. center-directions. This provides greater support for each center point arising from many surrounding pixels pointing towards the object center. We propose a formulation of center-directions that allows the problem to be split into the domain-specific dense regression of center-directions and the final localization task based on a small, lightweight, and domain-agnostic localization network that can be trained with synthetic data completely independent of the target domain. We demonstrate the performance of the proposed method on six different datasets for object counting and localization, and show that it outperforms the existing state-of-the-art methods. The code is accessible on GitHub at <a class="link-external link-https" href="https://github.com/vicoslab/CeDiRNet.git" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are object counting and localization problems, especially in computer vision tasks. Specifically, the authors target the object counting and localization problems based on point - supervised learning. These problems usually occur in application scenarios where it is necessary to determine the number of objects of a specific category in an image and their positions, such as industrial quality control, vehicle, building or ship counting in remote sensing, etc. ### Problem Background Traditional object counting methods usually rely on directly regressing the count number or density estimation map, but these methods cannot provide the specific position information of objects, which is not sufficient for many application scenarios. In addition, although annotation using bounding boxes or segmentation masks can provide more detailed information, the annotation process is very time - consuming and labor - intensive. Therefore, in recent years, research has begun to shift to using point annotations, which not only reduces the annotation workload but is also accurate enough in some cases. ### Challenges of Existing Methods The main challenges faced by existing point - supervised - based methods include: 1. **Imbalance between foreground and background pixels**: Since the number of marked center points is far less than the unmarked background pixels, an imbalance problem is likely to occur during model training. 2. **Limitations of directly regressing the center probability map or offset values**: These methods often only focus on pixels close to the center point and ignore the useful information provided by distant pixels. 3. **Requirement for complex post - processing**: Some methods require complex post - processing steps (such as clustering or Hough voting) to infer the object center, which is inefficient when dealing with a large number of objects. ### Solutions Proposed in the Paper To solve the above problems, the authors propose a new method named CeDiRNet, which achieves object counting and localization by densely regressing the directions (center - directions) pointing to the nearest object center. Specifically: - **Densely regress the center direction**: For each pixel position in the image, predict a direction pointing to the nearest object center. This enables the model to obtain center - related features from a larger surrounding area, thereby improving the support for the center. - **Light - weight domain - independent localization network**: Use a small, light - weight and domain - independent localization network to determine the position of the object center according to the regressed center direction. This network can be trained with synthetic data completely independent of the target domain. - **Reduce annotation workload**: Only point annotations are required for training, which greatly reduces the annotation workload. ### Method Advantages 1. **Larger support range**: Densely regressing the center direction enables the model to obtain useful information from pixels far from the center, improving the support for the center. 2. **Domain - independent localization network**: The localization network can be trained on synthetic data, completely independent of the target domain, simplifying the training process and improving generalization ability. 3. **Reduce annotation workload**: Only point annotations are required, significantly reducing the annotation cost. Through experiments on six different datasets, the authors demonstrate the superior performance of this method in object counting and localization tasks, surpassing the existing state - of - the - art methods.