Inferred box harmonization and aggregation for degraded face detection in crowds

Dong Liang,Qixiang Geng,Han Sun,Huiyu Zhou,Shun’ichi Kaneko
DOI: https://doi.org/10.1007/s11042-022-12319-y
IF: 2.577
2022-04-01
Multimedia Tools and Applications
Abstract:Since objects usually keep a certain distance from the surveillance camera, small object detection is a practical issue. Detecting small objects is also one of the remaining challenges in the computer vision community. The current detectors usually leverage a more robust backbone network, build one or more multi-scale feature pyramids, or define a more precise anchor-box screening criteria. However, the distinguishable features are scarce due to the appearance degradation and a shallow resolution. In this paper, we leverage high-level context to enhance anchor-based detectors' capabilities for small and crowded face detection. We first define face co-occurrence prior based on density maps (FCP-DM) to explore extensive high-level contextual information. We propose a score-size-specific non-maximum suppression (S3NMS) to replace the traditional non-maximum suppression at the end of anchor-based detectors. Our approach is plug and play and model-independent, which could be concatenated into the existing anchor-based face detectors without extra learning. Compared to the prior art on the WIDER FACE hard set, our method increases an Average Precision of 0.1%-1.3%, while on Crowd Face, which we make for testing small and crowded face detection, it raises an Average Precision of 1% - 6%. Codes and dataset have been available online.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?