Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions

Arunkumar Rathinam,Leo Pauly,Abd El Rahman Shabayek,Wassim Rharbaoui,Anis Kacem,Vincent Gaudillière,Djamila Aouada
2024-11-06
Abstract:Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the performance degradation in multispectral pedestrian detection under real-world conditions due to partial overlap and sensor failures. Specifically: 1. **Partial Overlap**: In practical applications, partial overlap between RGB (visible light) and thermal (infrared) images is a common issue. This may be caused by different fields of view in stereo camera configurations or pixel-level misalignment. 2. **Sensor Failures**: Sensor failures can lead to complete or partial loss of information in one modality. For example, a failure in a camera sensor array may result in the loss of entire or partial images. Existing fusion methods typically assume that RGB and thermal image pairs are fully overlapped, which is often not the case in real-world applications. Additionally, sensor failures can lead to information loss, further affecting the performance of algorithms. To address these issues, the authors propose a new module—the Hybrid Attention (HA) mechanism—to mitigate the impact of partial overlap and sensor failures on performance. This module combines self-attention and cross-attention mechanisms, maintaining high detection performance even in the presence of partial overlap and sensor failures. Furthermore, the authors designed a lightweight backbone network to accommodate the resource constraints of embedded systems. ### Main Contributions 1. **Introduction of the Hybrid Attention (HA) Module**: This module reduces performance degradation caused by modality-specific blackouts by combining self-attention and cross-attention mechanisms. 2. **Improved RGB-T Fusion Algorithm**: The proposed Hybrid Attention-based Multi-Label Pedestrian Detector (HA-MLPD) performs well in real-world scenarios such as partial overlap and sensor failures, and is resource-friendly. 3. **Experimental Validation**: The effectiveness of the proposed method is validated by simulating various partial overlap and sensor failure scenarios. Experimental results show that the method outperforms existing methods in handling real-world challenges. ### Experimental Results - **Dual-Modal Detection**: In dual-modal (RGB and thermal) scenarios, HA-MLPD outperforms existing methods. - **Single-Modal Blackout**: In cases where the RGB or thermal modality completely fails, HA-MLPD still performs well, outperforming models that use only RGB or thermal data. - **Partial Overlap**: In partial overlap scenarios, HA-MLPD significantly outperforms other methods, especially in cases of side blackouts and peripheral blackouts. ### Conclusion This study effectively addresses the performance degradation caused by partial overlap and sensor failures in multispectral pedestrian detection by introducing the Hybrid Attention mechanism, providing robust support for real-world applications.