ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification

Chen Mao,Chong Tan,Jingqi Hu,Min Zheng
2024-10-13
Abstract:Person re-identification(ReID), as a crucial technology in the field of security, plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions such as clothing changes and occlusions. In addition to cameras, we leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals and contribute a multimodal dataset. We employ a two-stream network to separately process video understanding and signal analysis tasks, and conduct multi-modal fusion and contrastive learning on pedestrian video and WiFi data. Extensive experiments in real-world scenarios demonstrate that our method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.
Computer Vision and Pattern Recognition,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in person re - identification (Person ReID), especially how to improve the accuracy and robustness of ReID when visual information is limited. Specifically, traditional image - based ReID methods are easily affected by objective conditions such as illumination, occlusion, and changes in pedestrians' clothing, resulting in poor recognition effects. In addition, the deployment range of cameras is limited and cannot cover all scenarios. To solve these problems, the author proposes a multi - modal person re - identification method - ViFi - ReID, which combines vision and WiFi signals (WiFi CSI signals). By using widely - distributed routers to capture pedestrians' gait information and fusing it with video data, this method can effectively deal with the limitations of traditional visual ReID methods, expand the sensing range, and improve the accuracy of ReID. ### Main problems and solutions: 1. **Limitations of visual information**: Traditional image - based ReID methods perform poorly in cases such as illumination changes, occlusion, and clothing changes. - **Solution**: Introduce WiFi signals as a supplementary modality and use the gait information in WiFi signals to enhance the recognition ability. 2. **Limited camera deployment range**: Cameras cannot cover all scenarios, especially in some places without cameras. - **Solution**: Use widely - distributed routers as sensing devices to capture WiFi signals, so that person re - identification can also be carried out in areas without cameras. 3. **Fusion of multi - modal data**: There is complementarity and correlation between visual and WiFi signals. How to effectively fuse the data of these two modalities is a key problem. - **Solution**: Design a two - stream network architecture to handle video understanding and signal analysis tasks respectively, and correlate data of different modalities through multi - modal fusion and contrastive learning. 4. **Cross - modal matching**: How to establish an effective matching relationship between different modalities to achieve more accurate person re - identification. - **Solution**: Adopt the contrastive learning method. By minimizing the distance between matching pairs and maximizing the distance between non - matching pairs, the cross - modal matching ability of the model is enhanced. ### Main contributions of the paper: - Propose a multi - modal person re - identification method ViFi - ReID that combines vision and WiFi signals. - Construct a multi - modal dataset ViFi - Indoors that contains pedestrian videos and WiFi data. - Design a two - stream network architecture to handle videos and WiFi signals respectively, and improve ReID performance through multi - modal fusion and contrastive learning. - Verify the effectiveness of this method in multiple scenarios through experiments, significantly improving the accuracy and robustness of ReID. In conclusion, this paper aims to solve the limitations of existing ReID methods in complex environments by combining vision and WiFi signals, and provide a more robust and extensive person re - identification solution.