Transformer-Based Sensor Fusion for Autonomous Driving: A Survey

Apoorv Singh

DOI: https://doi.org/10.48550/arXiv.2302.11481

2023-02-23

Abstract:Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics. Transformers-based detection head and CNN-based feature encoder to extract features from raw sensor-data has emerged as one of the best performing sensor-fusion 3D-detection-framework, according to the dataset leaderboards. In this work we provide an in-depth literature survey of transformer based 3D-object detection task in the recent past, primarily focusing on the sensor fusion. We also briefly go through the Vision transformers (ViT) basics, so that readers can easily follow through the paper. Moreover, we also briefly go through few of the non-transformer based less-dominant methods for sensor fusion for autonomous driving. In conclusion we summarize with sensor-fusion trends to follow and provoke future research. More updated summary can be found at: <a class="link-external link-https" href="https://github.com/ApoorvRoboticist/Transformers-Sensor-Fusion" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper aims to explore the application of Transformer - based sensor fusion technology in the field of autonomous driving, especially for 3D object detection tasks. Specifically, the paper attempts to solve the following core issues: 1. **Challenges in multi - modal data fusion**: The data generated by different sensors (such as cameras, LiDAR, RADAR) have large differences in distribution and are in different coordinate systems respectively (for example, LiDAR data is in the Cartesian coordinate system, RADAR data is in the polar coordinate system, and image data is in the perspective coordinate system). These differences lead to difficulties in spatial alignment, making the fusion of multi - modal data complex. 2. **Limitations of existing fusion methods**: The paper discusses several existing fusion methods, including detection - level fusion, proposal - level fusion and point - level fusion, and points out their respective advantages and disadvantages. For example, although detection - level fusion is simple, it cannot fully utilize the different attributes of different sensors in a single bounding box prediction; point - level fusion is easily affected by sensor calibration errors. 3. **Advantages of Transformer - based fusion methods**: The paper focuses on Transformer - based fusion methods, especially how to use the self - attention mechanism and cross - attention mechanism of Transformer to model the global context relationships between different modalities, thereby improving the accuracy of 3D object detection. 4. **Future research directions**: The paper also proposes future research directions, encouraging researchers to explore more innovative Transformer - based sensor fusion methods to further enhance the perception ability of autonomous driving systems. In summary, through review and analysis, this paper aims to provide researchers with a comprehensive perspective to understand the latest progress and future potential of Transformer - based sensor fusion technology in the field of autonomous driving.

Transformer-Based Sensor Fusion for Autonomous Driving: A Survey

Sensor Fusion by Spatial Encoding for Autonomous Driving

Multi-Sensor Fusion in Automated Driving: A Survey

Vision-RADAR fusion for Robotics BEV Detections: A Survey

A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey

Emerging Trends in Autonomous Vehicle Perception: Multimodal Fusion for 3D Object Detection

A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey

Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

Robust Cognitive Capability in Autonomous Driving Using Sensor Fusion Techniques: A Survey

Learned Fusion: 3D Object Detection using Calibration-Free Transformer Feature Fusion

Multi-Sensor Image Fusion: A Survey of the State of the Art

3D Vision with Transformers: A Survey

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving

Autonomous Multi-Sensor Fusion Techniques for Environmental Perception in Self-Driving Vehicles

Multi-modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy

Multi-modality 3D object detection in autonomous driving: A review