Lightweight Dual Stream Network With Knowledge Distillation for RGB-D Scene Parsing

Yuming Zhang,Wujie Zhou,Xiaoxiao Ran,Meixin Fang
DOI: https://doi.org/10.1109/lsp.2024.3378120
2024-03-27
IEEE Signal Processing Letters
Abstract:Significant progress has been made in the field of indoor scene parsing. The increasing demand for lightweight networks is due to the limited hardware capacity of mobile devices. However, there has been a lack of research on the design of lightweight networks for indoor scene parsing. Therefore, we propose lightweight dual stream network (LDSNet) with knowledge distillation (KD) for RGB-D indoor scene parsing. Initially, we developed a two-stream network with three versions (LDSNet-tiny*, LDSNet-small*, and LDSNet-base, where* represents the model after KD) for different scenarios. In the main stream, we designed an integrated joint enhancement module that captures valuable information from both RGB and depth features. This information is then processed by the cascading integration module to generate the final map. To improve the performance of the model, we included an auxiliary extraction module in the auxiliary stream to specifically extract feature information for KD. During the training process, we used hierarchical context loss to distill features and obtain LDSNet-tiny* and LDSNet-small*. We conducted experiments on the NYUDv2 and SUN RGB-D datasets, which demonstrated that our LDSNet-base achieves superior results, while LDSNet-tiny* and LDSNet-small* also exhibit satisfactory performance.
engineering, electrical & electronic
What problem does this paper attempt to address?