Counting Stacked Objects from Multi-View Images

Corentin Dumery,Noa Etté,Jingyi Xu,Aoxiang Fan,Ren Li,Hieu Le,Pascal Fua
2024-11-28
Abstract:Visual object counting is a fundamental computer vision task underpinning numerous real-world applications, from cell counting in biomedicine to traffic and wildlife monitoring. However, existing methods struggle to handle the challenge of stacked 3D objects in which most objects are hidden by those above them. To address this important yet underexplored problem, we propose a novel 3D counting approach that decomposes the task into two complementary subproblems - estimating the 3D geometry of the object stack and the occupancy ratio from multi-view images. By combining geometric reconstruction and deep learning-based depth analysis, our method can accurately count identical objects within containers, even when they are irregularly stacked. We validate our 3D Counting pipeline on diverse real-world and large-scale synthetic datasets, which we will release publicly to facilitate further research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the challenging problem of counting stacked three - dimensional objects from multi - view images. Specifically, existing methods have difficulties in dealing with stacked 3D objects because most objects are occluded by the objects in the upper layers, resulting in only a small number of objects being visible. In this case, the model needs to infer the existence and quantity of hidden objects based on limited visual cues. #### Main problem description 1. **Occlusion problem**: When objects are stacked into 3D structures, many objects are occluded and cannot be directly seen. This makes traditional two - dimensional counting methods (such as object counting in a single image) difficult to accurately estimate the total number of objects. 2. **Complex stacking patterns**: The stacking methods, directions and arrangements of different objects can be very irregular, increasing the difficulty of counting. 3. **Industrial and agricultural needs**: Accurately counting stacked items (such as products on pallets or fruits in boxes) is crucial for preventing inventory errors, improving operational efficiency and logistics management. #### Solution overview To solve the above problems, the author proposes a new 3D counting method, which decomposes the task into two complementary sub - problems: 1. **Estimating the 3D geometric structure of stacked objects**: Reconstruct the 3D geometric shape of the object stack through multi - view images. 2. **Estimating the occupancy rate**: Predict the proportion of the object occupying the total volume through deep learning. The final number of objects \( N \) can be calculated by the following formula: \[ N=\frac{\gamma V}{v} \] where: - \( V \) is the total volume of the container, - \( v \) is the average volume of a single object, - \( \gamma \) is the proportion of the object occupying the total volume. This method combines geometric reconstruction and deep - learning - based depth analysis, and can effectively handle the problem of counting stacked 3D objects and is applicable to a variety of practical scenarios.