Efficient Vision Transformer for Accurate Traffic Sign Detection

Javad Mirzapour Kaleybar,Hooman Khaloo,Avaz Naghipour

2023-11-03

Abstract:This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems. The development of reliable and highly accurate algorithms is crucial for the widespread adoption of traffic sign recognition and detection (TSRD) in diverse real-life scenarios. However, this task is complicated by suboptimal traffic images affected by factors such as camera movement, adverse weather conditions, and inadequate lighting. This study specifically focuses on traffic sign detection methods and introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task. The Transformer's attention mechanism, originally designed for natural language processing, offers improved parallel efficiency. Vision Transformers have demonstrated success in various domains, including autonomous driving, object detection, healthcare, and defense-related applications. To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module. This includes the introduction of the Efficient Convolution Block and the Local Transformer Block, which effectively capture short-term and long-term dependency information, thereby improving both detection speed and accuracy. Experimental evaluations demonstrate the significant advancements achieved by this approach, particularly when applied to the GTSDB dataset.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily addresses the issue of Traffic Sign Detection (TSD) in autonomous vehicles and driver assistance systems. Effective recognition of traffic signs is crucial for the reliability and safety of these systems. However, there are numerous challenges in practical applications, such as camera movement, adverse weather conditions, and insufficient lighting, which can degrade the quality of traffic images and increase the difficulty of traffic sign recognition and detection. To improve the accuracy and speed of traffic sign detection, the paper proposes a new method based on Vision Transformer (ViT) and specifically focuses on how to utilize the Transformer model and its variants to solve this problem. Specifically, the authors propose an Efficient Vision Transformer architecture, which combines locality inductive bias and transformer modules to capture both short-term and long-term dependencies in image data. This architecture includes two core components: 1. **Efficient Convolution Block (ECB)**: Used to capture short-term dependencies in images. ECB employs a unique Multi-Head Convolutional Attention (MHCA) mechanism, which can effectively handle local features. 2. **Local Transformer Block (LTB)**: Used to capture high-frequency data information in images. LTB can not only handle high-frequency signals but also acts as a portable mixer between high and low-frequency signals, enhancing the modeling capability of the entire network. Experiments conducted on the German Traffic Sign Detection Benchmark (GTSDB) dataset demonstrate that the proposed Efficient Vision Transformer method achieves significant improvements in both accuracy and detection speed, especially in the AP50, AP75, and overall Average Precision (AP) metrics. In summary, this study addresses the technical challenges in traffic sign detection by introducing a novel Transformer architecture and demonstrates its effectiveness in real-world application scenarios.

Efficient Vision Transformer for Accurate Traffic Sign Detection

Traffic Sign Recognition Using Local Vision Transformer

Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers

Pyramid Transformer for Traffic Sign Detection

Short-Term Speed Forecasting of Large-Scale Urban Road Network Based on Transformer

TSD-DETR: A Lightweight Real-Time Detection Transformer of Traffic Sign Detection for Long-Range Perception of Autonomous Driving

Potholes and traffic signs detection by classifier with vision transformers

A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

Traffic Sign Recognition from Digital Images by Using Deep Learning

A Vision Transformer Approach for Traffic Congestion Prediction in Urban Areas

Real-time traffic sign detection network based on Swin Transformer

Efficient Vision Transformer YOLOv5 for Accurate and Fast Traffic Sign Detection

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

Driver Distraction Behavior Detection Using a Vision Transformer Model Based on Transfer Learning Strategy

HCLT-YOLO: A Hybrid CNN and Lightweight Transformer Architecture for Object Detection in Complex Traffic Scenes

DetectFormer: Category-Assisted Transformer for Traffic Scene Object Detection.

Training Strategies for Vision Transformers for Object Detection

Efficient Inductive Vision Transformer for Oriented Object Detection in Remote Sensing Imagery

Traffic sign detection based on classic visual recognition models

Improved object detection method for unmanned driving based on Transformers

Machine Vision Based Traffic Sign Detection Methods: Review, Analyses and Perspectives