Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition

Xiong Lv,Xinda Liu,Xiangyang Li,Xue Li,Shuqiang Jiang,Zhiqiang He
DOI: https://doi.org/10.1007/s11042-016-3375-5
IF: 2.577
2016-01-01
Multimedia Tools and Applications
Abstract:Hand-held object recognition is an important research topic in image understanding and plays an essential role in human-machine interaction. With the easily available RGB-D devices, the depth information greatly promotes the performance of object segmentation and provides additional channel information. While how to extract a representative and discriminating feature from object region and efficiently take advantage of the depth information plays an important role in improving hand-held object recognition accuracy and eventual human-machine interaction experience. In this paper, we focus on a special but important area called RGB-D hand-held object recognition and propose a hierarchical feature learning framework for this task. First, our framework learns modality-specific features from RGB and depth images using CNN architectures with different network depth and learning strategies. Secondly a high-level feature learning network is implemented for a comprehensive feature representation. Different with previous works on feature learning and representation, the hierarchical learning method can sufficiently dig out the characteristics of different modal information and efficiently fuse them in a unified framework. The experimental results on HOD dataset illustrate the effectiveness of our proposed method.
What problem does this paper attempt to address?