A Real-Time DETR Approach to Bangladesh Road Object Detection for Autonomous Vehicles

Irfan Nafiz Shahan,Arban Hossain,Saadman Sakib,Al-Mubin Nabil
2024-11-23
Abstract:In the recent years, we have witnessed a paradigm shift in the field of Computer Vision, with the forthcoming of the transformer architecture. Detection Transformers has become a state of the art solution to object detection and is a potential candidate for Road Object Detection in Autonomous Vehicles. Despite the abundance of object detection schemes, real-time DETR models are shown to perform significantly better on inference times, with minimal loss of accuracy and performance. In our work, we used Real-Time DETR (RTDETR) object detection on the BadODD Road Object Detection dataset based in Bangladesh, and performed necessary experimentation and testing. Our results gave a mAP50 score of 0.41518 in the public 60% test set, and 0.28194 in the private 40% test set.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: to achieve real - time object detection on Bangladeshi roads to support the safe operation of autonomous vehicles. Specifically, the authors used a real - time detection model based on the Transformer architecture (Real - Time DETR, RTDETR) and conducted experiments and tests on the Bangladesh Road Object Detection Dataset (BadODD), aiming to improve the speed and accuracy of object detection. ### Problem Background 1. **Importance of Computer Vision** - Computer vision (CV) plays a crucial role in modern technology, especially in the field of autonomous vehicles. Accurate and fast object detection is vital for the safety and decision - making ability of autonomous vehicles. 2. **Limitations of Existing Methods** - Traditional object detection methods such as Region - based Convolutional Neural Networks (R - CNN) and You - Only - Look - Once (YOLO) have advantages in speed, but their performance in terms of accuracy and in complex scenarios is somewhat lacking. - DETR (Detection Transformer) has superior performance, but its inference time is long and it is not suitable for real - time applications. 3. **Advantages of RTDETR** - RTDETR combines the speed advantage of YOLO and the powerful expressive ability of Transformer, and can significantly reduce the inference time while maintaining high accuracy, which is suitable for real - time object detection tasks. ### Research Objectives - **Improve Real - Time Performance**: By optimizing the model structure and parameter configuration, ensure that the model can achieve real - time detection in practical applications. - **Improve Accuracy**: Conduct training and testing on a specific dataset (BadODD) to ensure the detection accuracy of the model in complex road environments. - **Meet Challenges**: Deal with various challenges in the dataset, such as class imbalance, image quality problems (such as halos, night - time images, windshield stains, etc.), to enhance the robustness of the model. ### Main Contributions - **Model Selection and Optimization**: Select the RTDETR model and optimize the inference speed by adjusting the number of decoder layers and other hyper - parameters. - **Data Pre - processing**: Conduct detailed pre - processing on the dataset, including image size adjustment, label correction, and the application of multiple data augmentation methods. - **Experimental Verification**: Verify the performance of the model on public and private test sets through a large number of experiments, and obtain relatively good mAP50 scores (0.41518 and 0.28194 respectively). ### Conclusions and Future Work Although RTDETR performs well in terms of real - time performance and accuracy, there are still some challenges, such as the detection of small and occluded objects, and performance under extreme conditions. Future work can focus on further optimizing the model structure, improving data pre - processing methods, and exploring more solutions suitable for different scenarios. Through these efforts, this research provides new ideas and technical support for autonomous vehicles to achieve safer and more efficient object detection in complex road environments.