Infrared Multi-Object Contrast Enhancement and Detection Based on Layered Visual Transformer Network for Autonomous Driving

Min Li,Jinhui Lan,Luyang Wang,Ying Zhang,Kun Huang
DOI: https://doi.org/10.1109/jsen.2024.3466397
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:For autonomous driving in urban environments, achieving reliable object detection under various lighting conditions is crucial. Thermal infrared cameras, capable of capturing images passively without relying on external light sources, face challenges due to the low contrast of infrared images, which complicates the detection and recognition of multiple objects. To address this challenge, we propose the layered visual transformer network (LVTN), which is divided into three components: backbone, encoder, and decoder and detection head. (1) The backbone network, LVTN, transforms image pixels into vectors and employs layering techniques combined with feature aggregation from each layer to maintain the integrity of features and information in infrared images. (2) The encoder introduces adaptive-feedback attention (AFA) to replace traditional attention mechanisms, focusing on subtle object features and enhancing the contrast between the object and the background. (3) The decoder and detection head introduces the fine-grained matching loss function (FMLF), which dynamically adjusts training weights, gives higher attention to object-dense regions, and addresses the issues of multi-scale and dense object detection. We trained and validated our model on the FLIR-ADAS and KAIST datasets, achieving mAP scores of 62.6% and 75.6%, respectively, surpassing other state-of-the-art infrared detection algorithms.
What problem does this paper attempt to address?