Towards Real-World Multi-View Object Classification: Dataset, Benchmark, and Analysis

Ren Wang,Tae Sung Kim,Jin-Sung Kim,Hyuk-Jae Lee
DOI: https://doi.org/10.1109/tcsvt.2024.3359681
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Aggregating information from multiple views is essential to accurately identifying similar objects. Nevertheless, existing datasets have limitations that hinder the development of practical multi-view object classification methods for real-world scenarios. The limitations include synthetic and coarse-grained objects in the datasets and the absence of a validation split to enable standard hyperparameter tuning. This paper proposes a new dataset, MVP-N (Multi-View, Retail Products, Label Noise), which contains 16k real captured views and 9k multi-view sets collected from 44 retail products. In MVP-N, each view is annotated with a human-perceived information quantity (HPIQ) for analyzing how views are utilized in information aggregation. Moreover, the fine-grained categorization of objects provides the inter-class view similarity and intra-class view variance, enabling the research on learning from noisy labels of the multi-view images. Finally, a new soft label scheme, HS-HPIQ, is proposed considering the hidden stratification phenomenon in the multi-view images and achieves superior performance. To assess the effectiveness of MVP-N and the proposed HS-HPIQ, this study overviews 50 recent multi-view-based methods regarding their practicality in real-world scenarios. Six feature aggregation methods and twelve soft label methods are benchmarked on MVP-N with a deep analysis. The dataset and code are publicly available at https://github.com/SMNUResearch/MVP-N.
engineering, electrical & electronic
What problem does this paper attempt to address?