Abstract:This study presents YOLO‐IR and YOLO‐DeepOC‐IR, innovative methods for infrared object detection and tracking in dense urban traffic, emphasizing image processing techniques. The work employs infrared image enhancement techniques (local contrast multi‐scale enhancement, non‐local means, and contrast limited adaptive histogram equalization), a MobileVITv3 backbone, and a multi‐layer feature extraction module with canny edge detection, Gabor filtering, and open operation layers for improved detection in infrared imagery. The tracker's feature processing is further optimized using the learned arrangements of three patch codes descriptor and locality‐sensitive hashing, showcasing the pivotal role of image processing in enhanced traffic surveillance in infrared scenes. Infrared object detection and tracking in dense urban traffic remain a challenge due to factors such as low contrast, small intra‐class differences, and frequent false positives and negatives. To overcome these, the authors introduce YOLO‐IR, an algorithm based on the enhanced YOLOv8s, and YOLO‐DeepOC‐IR, a comprehensive infrared multi‐object tracking method for urban traffic, integrating both detection and tracking. During preprocessing, three infrared image enhancement techniques, local contrast multi‐scale enhancement, non‐local means, and contrast limited adaptive histogram equalization, are applied for better reliability in dense scenes. To further improve the performance, the original YOLOv8s backbone is replaced with MobileVITv3 to enhance detection accuracy and robustness. This infrared feature extraction module, incorporated into the detector, combines canny edge detection, Gabor filtering, and open operation layers, significantly boosting object detection in infrared imagery. The tracker's feature processing capabilities are improved using the learned arrangements of three patch codes descriptor and locality‐sensitive hashing for feature extraction and matching. Experimental results on FLIR ADAS v2 and InfiRay datasets indicate superior performance of this method, achieving 78.6% mAP and 151.1 FPS in detection, and up to 80.8% moving object tracking accuracy, 78.6% identification F1 score, and 62.1% higher order tracking accuracy in multi‐object tracking.

Infrared Multi-Object Contrast Enhancement and Detection Based on Layered Visual Transformer Network for Autonomous Driving

A Fourier-Transform-Based Framework with Asymptotic Attention for Mobile Thermal InfraRed Object Detection

Lightweight Spatial Sliced-Concatenate-Multireceptive-Field Enhance and Joint Channel Attention Mechanism for Infrared Object Detection

Enhanced Detection and Recognition of Road Objects in Infrared Imaging Using Multi-Scale Self-Attention

Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection

Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds

Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer

Infrared multi‐target detection and tracking in dense urban traffic scenes

Lane Detection Transformer Based on Multi-frame Horizontal and Vertical Attention and Visual Transformer Module.

LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection

A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images

Mafe-Net:Multi-Scale Adaptive Feature Enhancement Network for Infrared Weak Vehicle Targets Detection

DTNet: A Specialized Dual-Tuning Network for Infrared Vehicle Detection in Aerial Images

Object Detection in Thermal Spectrum for Advanced Driver-Assistance Systems (ADAS)

Deep LiDAR-Radar-Visual Fusion for Object Detection in Urban Environments

OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition

TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible Translation

FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision Transformer Fusion

Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection

CViTF-Net: A Convolutional and Visual Transformer Fusion Network for Small Ship Target Detection in Synthetic Aperture Radar Images

YOLO-CIR: The network based on YOLO and ConvNeXt for infrared object detection