Multiscale Attention and Feature Decomposition Network for Surveillance Vehicle Detection

Wei Xie,Weiming Liu,Yuan Dai,Ruikang Liu
DOI: https://doi.org/10.1109/jsen.2024.3449329
IF: 4.3
2024-10-04
IEEE Sensors Journal
Abstract:Detecting vehicles from a surveillance perspective is crucial due to its extensive application in traffic management and public safety. However, existing methods naively concatenate feature maps of different scales in a feature pyramid network (FPN), treating each feature map equally. This leads to smaller scale feature maps being overshadowed by larger scale ones, making it difficult to detect small vehicles. Additionally, since classification and localization tasks focus on features in different spatial locations, the shared features between these two tasks result in a feature conflict problem. However, existing methods are insufficient to address this problem, resulting in high classification scores being predicted from features at certain spatial locations, while simultaneously generating imprecise bounding boxes from the same features. To address these challenges, this article presents a multiscale attention and feature decomposition network (MAFD-Net) based on multiscale attention feature fusion (MSAFF), task-aware feature decomposition (TAFD), and task-aware loss (TL). MSAFF effectively integrates features of different scales in FPN by utilizing three distinct domain-level attention mechanisms. TAFD introduces a learning-guided feature decomposition into the classification and localization branches, allowing these two branches to adaptively discover optimal location features. TL guides TAFD to make further task-aware predictions. Extensive experiments demonstrate the superiority of MAFD-Net compared to state-of-the-art (SOTA) algorithms. Specifically, the detection accuracy was improved from 58.8% to 63.4% on the UA-DETRAC dataset and from 40.7% to 44.4% on the COCO dataset.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?