Abstract:Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the problem of robust object instance re-identification (ReID) in multimodal scenarios and achieving global localization based on this. Specifically: 1. **Object Instance Re-identification (ReID)**: - **Background**: Existing research mainly focuses on re-identification of pedestrians and vehicles, with less attention given to general object instance re-identification. - **Challenges**: The main challenges of object instance re-identification include the diversity of object structures, appearances, and types, and the lack of unified features. Additionally, changes in environmental conditions (such as lighting variations, occlusions, etc.) can also affect the accuracy of re-identification. 2. **Global Localization**: - **Background**: In robot navigation, global relocalization is a key task, especially in repetitive scenes or environments with multiple rooms. Traditional global relocalization methods often rely on point cloud alignment or large sets of images, which may contain redundant information and perform poorly in complex environments. - **Challenges**: How to achieve high-precision indoor global localization using object instance re-identification technology without manual annotation. ### Solutions To address the above problems, the paper proposes the following solutions: 1. **Dual-path Attention Transformer (DATOR)**: - **Architecture**: DATOR is a dual-path transformer architecture that combines RGB and depth information, generating robust final embedding vectors by exchanging and fusing information from both modalities. - **Advantages**: DATOR can maintain high re-identification accuracy under different viewpoints and environmental conditions, performing exceptionally well in lighting variations and complex scenes. 2. **Object Instance-based Global Localization Framework**: - **Process**: This framework first constructs an object instance-based map, and then determines the pose of a given query RGB-D image by matching object instances in the map. - **Innovation**: This framework does not require manual annotation and can effectively operate in diverse indoor environments, achieving high-precision localization through optimized alignment. ### Experimental Validation - **Datasets**: The paper uses self-built RGB-D datasets (DATOR-lab and DATOR-synth), as well as the open-source TUM RGB-D dataset for experimental validation. - **Results**: DATOR significantly outperforms other methods on multiple datasets, particularly in object instance re-identification (mAP of 75.18) and global localization success rate (83.01% success rate on the TUM-RGBD dataset). ### Conclusion By proposing the DATOR model and the object instance-based global localization framework, this paper addresses the problem of robust object instance re-identification and high-precision global localization in multimodal scenarios, providing new ideas and technical support for autonomous indoor navigation.

Towards Global Localization using Multi-Modal Object-Instance Re-Identification

Person Re-identification Based on Transform Algorithm

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Object Re-Identification from Point Clouds

HSMR: A Head-Shoulder Mask Aided ResNet to Guide Focus of Re-Identification Implemented on Tour-Guide Robot.

MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification

Object Re-identification via Spatial-temporal Fusion Networks and Causal Identity Matching

TransReID: Transformer-based Object Re-Identification

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification.

Closing the Domain Gap for Cross-modal Visible-Infrared Vehicle Re-identification

V2ReID: Vision-Outlooker-Based Vehicle Re-Identification

TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification.

Translation, Association and Augmentation: Learning Cross-Modality Re-Identification From Single-Modality Annotation

Robust Depth-Based Person Re-Identification

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Transformer for Object Re-Identification: A Survey

Multi-scale local-global architecture for person re-identification

Cross-View Multi-Scale Re-Identification Network in the Perspective of Ground Rotorcraft Unmanned Aerial Vehicle