Learning shared template representation with augmented feature for multi-object pose estimation

Qifeng Luo,Ting-Bing Xu,Fulin Liu,Tianren Li,Zhenzhong Wei
DOI: https://doi.org/10.1016/j.neunet.2024.106352
Abstract:Template matching pose estimation methods based on deep learning have made significant advancements via metric learning or reconstruction learning. Existing approaches primarily build distinct template representation libraries (codebooks) from rendered images for each object, which complicate the training process and increase memory cost for multi-object tasks. Additionally, they struggle to effectively handle discrepancies between the distributions of training and test sets, particularly for occluded objects, resulting in suboptimal matching accuracy. In this study, we propose a shared template representation learning method with augmented semantic features to address these issues. Our method learns representations concurrently using metric and reconstruction learning as similarity constraints, and augments response of network to objects through semantic feature constraints for better generalization performance. Furthermore, rotation matrices serve as templates for codebook construction, leading to excellent matching accuracy compared to rendered images. Notably, it contributes to the effective decoupling of object categories and templates, necessitating the maintenance of only a shared codebook in multi-object pose estimation tasks. Extensive experiments on Linemod, Linemod-Occluded and TLESS datasets demonstrate that the proposed method employing shared templates achieves superior matching accuracy. Moreover, proposed method exhibits robustness on a collected aircraft dataset, further validating its efficacy.
What problem does this paper attempt to address?