W6DNet: Weakly Supervised Domain Adaptation for Monocular Vehicle 6-D Pose Estimation With 3-D Priors and Synthetic Data
Yangxintong Lyu,Remco Royen,Adrian Munteanu
DOI: https://doi.org/10.1109/tim.2024.3363789
IF: 5.6
2024-03-02
IEEE Transactions on Instrumentation and Measurement
Abstract:Synthetic traffic datasets provide highly accurate and affordable annotations, which are of crucial importance in complex vision-based perception tasks performed on real-world traffic data. Due to the lack of paired 2-D–3-D data, it remains very challenging when adapting the knowledge of a vehicle's pose in SE(3) with its known 3-D geometry. In this article, we first propose a synthetic dataset, SynthV6D, enabling 6-D pose estimation of vehicles in monocular traffic images. The dataset comprises industrial-grade vehicles in motion evolving in realistic virtual scenery, covering a wide range of viewpoints and distances. Second, we introduce a weakly supervised domain adaptation approach, dubbed W6DNet, to recover the 6-D pose. To this end, by using the synthetic dataset, a novel linked image feature space-based domain adaptation is introduced. Furthermore, an original two-step double-fusion block is proposed to fuse the multi-modal data representations and the cross-space features. Consequently, the proposed method learns the pose-specific embeddings. We evaluate W6DNet on the real-world ApolloCar3D dataset. Extensive experimental results demonstrate that, when a small amount of real-world data is accessible, the proposed approach can significantly advance the performance when adapting knowledge from SynthV6D. Moreover, it achieves competitive performance compared to fully supervised state-of-the-art methods. The code is available at https://github.com/YangLyu-123/TIM-W6DNet.git.
engineering, electrical & electronic,instruments & instrumentation