OBhunter: An ensemble spectral-angular based transformer network for occlusion detection
Jiangnan Zhang,Kewen Xia,Zhiyi Huang,Sijie Wang,Romoke Grace Akindele
DOI: https://doi.org/10.1016/j.eswa.2024.123324
IF: 8.5
2024-02-06
Expert Systems with Applications
Abstract:Hunting for blocked objects is complicated in crowded scenes due to the frequent occlusions. However, creating an effective occlusion detector remains challenging for the following inherent reasons: (1) the limited feature extraction capacity of encoders and (2) the loss of highly overlapped objects by decoders. We propose a spectral-angular ensemble-based Transformer network, OBhunter, to address these two issues. In OBhunter, an effective encoder with robust feature extraction performance is constructed through the ensemble spectral-angular self-attention (ESA) mechanism, extending the original softmax-based attention to the spectral characteristic dimension. To tackle the second issue, we branch the decoder using our crowded region generator (CRG). These two branches undergo differential processing by ensemble spectral-angular region (ESR) loss, a multi-task training loss function, to prevent erroneous suppression of proposal boxes. Extensive experiments demonstrate that our OBhunter is effective in occlusion detection based on CrowdHuman, CityPersons, and Caltech-Pedestrians datasets. With OBhunter, the occlusion detection performance achieves 33.52% MR −2 . Additionally, we validate the robustness of our OBhunter on a less crowded dataset such as MS-COCO.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science