Abstract:We address multi-view pedestrian detection in a setting where labeled data is collected using a multi-camera setup different from the one used for testing. While recent multi-view pedestrian detectors perform well on the camera rig used for training, their performance declines when applied to a different setup. To facilitate seamless deployment across varied camera rigs, we propose an unsupervised domain adaptation (UDA) method that adapts the model to new rigs without requiring additional labeled data. Specifically, we leverage the mean teacher self-training framework with a novel pseudo-labeling technique tailored to multi-view pedestrian detection. This method achieves state-of-the-art performance on multiple benchmarks, including MultiviewX$\rightarrow$Wildtrack. Unlike previous methods, our approach eliminates the need for external labeled monocular datasets, thereby reducing reliance on labeled data. Extensive evaluations demonstrate the effectiveness of our method and validate key design choices. By enabling robust adaptation across camera setups, our work enhances the practicality of multi-view pedestrian detectors and establishes a strong UDA baseline for future research.

What problem does this paper attempt to address?

This paper attempts to solve the problem of model performance degradation in multi - view pedestrian detection when the camera settings used in testing are different from those used in training. Specifically, current multi - view pedestrian detectors perform poorly when using camera settings different from those in training. To solve this problem, the paper proposes an unsupervised domain adaptation (UDA) method, which can make the model adapt to new camera settings without using additional labeled data. ### Main problems 1. **Generalization problem across camera settings**: Existing multi - view pedestrian detection methods have poor generalization ability across different camera settings because they rely on labeled data of specific camera settings for training. 2. **Scarcity of labeled data**: Multi - view labeled datasets are very scarce and costly, which limits the generalization ability of the model. ### Solutions The paper proposes a method based on unsupervised domain adaptation (UDA), called MVUDA (Multi - View Unsupervised Domain Adaptation), with the following main features: - **Self - training framework**: Utilize the mean teacher self - training framework to train the student model through pseudo - label generation. - **Novel pseudo - label generation technique**: Propose a local - max pseudo - labeling method to improve the reliability of pseudo - labels. - **No need for external labeled data**: Unlike previous methods, this method does not rely on any external labeled data or pre - trained models, enhancing its practicality. ### Experimental results The paper conducted experiments on multiple benchmark datasets, including cross - domain benchmarks such as MultiviewX → Wildtrack and Wildtrack → MultiviewX, as well as two newly introduced benchmarks GMVD1 → MultiviewX and GMVD2 → MultiviewX. The experimental results show that the MVUDA method significantly improves the performance of the baseline model and reaches the state - of - the - art level without the need for additional labeled data. ### Summary This paper solves the generalization problem of multi - view pedestrian detection across different camera settings by proposing an unsupervised domain adaptation method, reduces the dependence on labeled data, and improves the practicality and robustness of the model.

MVUDA: Unsupervised Domain Adaptation for Multi-view Pedestrian Detection

Multi-View Domain Adaptive Object Detection on Camera Networks.

Unsupervised Domain Adaptation for Multispectral Pedestrian Detection

Unsupervised Multi-view Pedestrian Detection

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection

Unsupervised Domain Adaptation Approach for Vision-Based Semantic Understanding of Bridge Inspection Scenes Without Manual Annotations

Pedestrian detection with unsupervised multispectral feature learning using deep neural networks

Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data

Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis

Multiview Latent Space Learning with Progressively Fine-tuned Deep Features for Unsupervised Domain Adaptation

Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher

Unsupervised Domain Adaptation for Remote-Sensing Vehicle Detection Using Domain-Specific Channel Recalibration

xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Pedestrian Detection for Autonomous Vehicles Using Virtual-to-Real Augmentation

Few-Shot Supervised Prototype Alignment for Pedestrian Detection on Fisheye Images

Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features

Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using Meta-Learning

Multiview Detection with Feature Perspective Transformation

CMT: Co-training Mean-Teacher for Unsupervised Domain Adaptation on 3D Object Detection