Learning Disentangled Representation for Multi-View 3D Object Recognition.

Jingjia Huang,Wei Yan,Ge Li,Thomas Li,Shan Liu
DOI: https://doi.org/10.1109/tcsvt.2021.3062190
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:3D object recognition is a hot research topic. Particularly, view-based methods, which represent a 3D object with a collection of its rendered views on the 2D domain, play an important role in this field. Currently, view-based researches tend to aggregate information from multiple views via pooling based strategies to endow the models with the characteristic of view permutation invariance, at the cost of inevitable loss of useful features. In this paper, we introduce a new method that learns a more comprehensive descriptor for a 3D object from its views while successfully keeping its robustness to the variation of view permutation. Our method disentangles the information in the set of multi-view images into a global category-related feature and a set of view-permutation related features. To unbind these two parts, an encode-decoder based disentangling architecture is proposed, which barely bring extra computations compared to the baseline model. Systematic experiments are conducted for this new method to demonstrates the effectiveness and the competitive performance based on ModelNet40, ModelNet10, and ShapeNetCore55 datasets. Codes for our paper will be released soon on “https://github.com/hjjpku/multi_view_sort”.
What problem does this paper attempt to address?