Surface Material Perception Through Multimodal Learning

Shi Mao,Mengqi Ji,Bin Wang,Qionghai Dai,Lu Fang
DOI: https://doi.org/10.1109/JSTSP.2022.3171682
IF: 7.695
2022-01-01
IEEE Journal of Selected Topics in Signal Processing
Abstract:Accurately perceiving object surface material is critical for scene understanding and robotic manipulation. However, it is ill-posed because the imaging process entangles material, lighting, and geometry in a complex way. Appearance-based methods cannot disentangle lighting and geometry variance and have difficulties in textureless regions. We propose a novel multimodal fusion method for surface material perception using the depth camera shooting structured laser dots. The captured active infrared image was decomposed into diffusive and dot modalities and their connection with different material optical properties (i.e. reflection and scattering) were revealed separately. The geometry modality, which helps to disentangle material properties from geometry variations, is derived from the rendering equation and calculated based on the depth image obtained from the structured light camera. Further, together with the texture feature learned from the gray modality, a multimodal learning method is proposed for material perception. Experiments on synthesized and captured datasets validate the orthogonality of learned features. The final fusion method achieves 92.5% material accuracy, superior to state-of-the-art appearance-based methods (78.4%).
What problem does this paper attempt to address?