Integrating Semantic Segmentation Model for Self-Supervised Scene Flow Estimation Via Cross Task Distillation

Bayram Bayramli,Yue Ding,Hongtao Lu
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650154
2024-01-01
Abstract:This paper introduces a self-supervised learning approach for monocular scene flow estimation, addressing challenges posed by the reliance on expensive 3D sensing technologies such as Lidar and RGB-D cameras and the scarcity of datasets with ground truth labels. Our method utilizes cross-task distillation, where a semantic segmentation network serves as a teacher to impart valuable information to the scene flow network. To facilitate effective information exchange between tasks with inherent differences, we incorporate self-attention blocks within each network. Specifically, we transfer self-attention weights from the semantic segmentation network to the scene flow network, aligning the attention probabilities of both networks. The integration of self-attention mechanisms enhances the adaptability of our framework to complex scene structures, contributing to robust and accurate scene flow estimation. Quantitative and qualitative experiments validate the efficacy of our approach, demonstrating a significant reduction in the error metric from 30.97% to 28.50%, representing an approximate 8% improvement compared to the best-performing existing self-supervised scene flow method.
What problem does this paper attempt to address?