Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting

Qi Zhang,Yunfei Gong,Daijie Chen,Antoni B. Chan,Hui Huang
2024-05-30
Abstract:Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and camera calibration errors. This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach that better fuses multi-camera information under large scenes. Besides, a large synthetic dataset is adopted to enhance the model's generalization ability and enable more practical evaluation and comparison. The model's performance on new testing scenes is further improved with a simple domain adaptation technique. Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance. See code here: https://vcc.tech/research/2024/MVD.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: Currently, multi - view people detection (MVD) methods based on deep learning are trained and evaluated on datasets with small scenes, a limited number of frames, and fixed camera viewpoints, resulting in poor performance of these methods in larger and more complex real - world scenarios. Specifically, existing methods have the following three main problems: 1. **Limited scene scale**: Existing MVD methods are mainly evaluated in small scenes of about 20 meters by 20 meters, while the scenes in practical applications may be much larger and there are more severe occlusions and camera calibration errors. 2. **Limited data volume and camera viewpoints**: Existing datasets contain a relatively small number of frames (for example, the Wildtrack dataset has only a few hundred frames), and the camera viewpoints are fixed (for example, Wildtrack has 7 viewpoints and MultiviewX has 6 viewpoints). This restricts the full verification and comparison of different methods. 3. **Poor generalization ability**: Existing methods are trained on a single scene and are prone to over - fitting specific camera layouts, and it is difficult to generalize to new, unseen scenes and different camera layouts. To solve these problems, this paper proposes a supervised view - wise contribution weighting method to better fuse multi - camera information, especially in large - scale scenes. In addition, the author also uses a large - scale synthetic dataset to enhance the generalization ability of the model and further improves the performance of the model in new test scenarios through simple domain adaptation techniques. ### Specific problem descriptions - **Multi - view people detection in large - scale scenes**: How to achieve accurate people detection in larger and more complex scenes, especially in the presence of severe occlusions and camera calibration errors. - **Improving the generalization ability of the model**: How to make the model adapt to new, unseen scenes and different camera layouts, rather than being limited to the single scene used during training. - **Limitations of datasets**: How to overcome the limitations of existing datasets in terms of scene scale, number of frames, and camera viewpoints in order to more comprehensively evaluate and compare different MVD methods. By solving these problems, this paper aims to extend multi - view people detection to more challenging and practical application scenarios.