OADB-Net: an Occlusion-Aware Dual-Branch Network for Pedestrian Detection
Jun Li,Tong Chen,Kangyou Ji,Qiming Li
DOI: https://doi.org/10.1109/tits.2024.3495814
IF: 8.5
2024-01-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:With the advancements in deep learning, detecting occluded pedestrians has become a focal point of research. Extracting pedestrian parts has proven to be an effective solution for handling occlusion. However, existing methods mainly rely on Region Proposal Networks (RPNs) for the part feature extraction. These RPNs-based methods suffer from limitations such as complex structures and limited receptive fields, which hinder their ability to capture global dependency information. To overcome these challenges, we propose a simple but effective Occlusion-Aware Dual-Branch Network (OADB-Net) based on an anchor-free framework for pedestrian detection in crowded scenes. Specifically, we design a dual-branch occlusion-aware detection head, consisting of a full-body detection branch and a head-shoulder detection branch, to address the occlusion issue in crowded scenes. The head-shoulder detection branch to handle heavily occluded instances and the full-body branch to focus on non-heavily occluded instances. Furthermore, we propose a Cross-Layer Non-Local Module (CLNL-Module), which captures long-range dependencies across feature layers, to effectively differentiate the relationship between pedestrian body and body parts while integrating essential features for more accurate and discriminative pedestrian representation. This strengthens the connections between the dual detection branches and enhances the responses of their respective center heatmaps. Our OADB-Net leverages part and full-body features to handle pedestrians with varying degrees of occlusion, while avoiding the limitations of RPNs-based methods. In heavy occlusion settings, OADB-Net achieves the average miss rates of 39.9%, 26.8%, and 43.1% on the Citypersons, Caltech, and CrowdHuman datasets, respectively, and demonstrates superior performance in traffic scenes.