MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition.

Anran Wang,Jianfei Cai,Jiwen Lu,Tat-Jen Cham
DOI: https://doi.org/10.1109/iccv.2015.134
2015-01-01
Abstract:Most of the feature-learning methods for RGB-D object recognition either learn features from color and depth modalities separately, or simply treat RGB-D as undifferentiated four-channel data, which cannot adequately exploit the relationship between different modalities. Motivated by the intuition that different modalities should contain not only some modal-specific patterns but also some shared common patterns, we propose a multi-modal feature learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multimodal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities. In this way, we obtain features reflecting shared properties as well as modal-specific properties in different modalities. The information of the multi-modal learning frameworks is back-propagated to the early CNN layers. Experimental results show that our proposed multimodal feature learning method outperforms state-of-the-art approaches on two widely used RGB-D object benchmark datasets.
What problem does this paper attempt to address?