Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

Juan Zhong,Zheng Liu,Xi Chen
2023-04-21
Abstract:Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Robotics,Systems and Control
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are as follows: 1. **Application of Transformer Architecture in Autonomous Driving**: In recent years, the Transformer architecture has demonstrated remarkable performance in various autonomous driving tasks. However, how to effectively deploy these models on portable computing platforms (such as embedded systems in autonomous vehicles) and achieve efficient hardware acceleration remains a crucial issue. 2. **Challenges of Hardware Acceleration**: In order for the Transformer model to be widely used in actual autonomous driving scenarios, the problem of its efficient deployment and acceleration on hardware must be solved. This includes optimizing Transformer operators to adapt to dedicated hardware (such as AI chips), thereby improving computational efficiency, reducing power consumption, and ensuring real - time performance. 3. **Analysis of Transformer Model Structure and Application**: The paper aims to provide a comprehensive review, covering the structures of Transformer models specifically designed for autonomous driving tasks (such as encoder - decoder structure and encoder - only structure), and exploring the advantages and disadvantages of different structures. 4. **Optimization at the Operational Level**: The paper delves into Transformer - related operations and their hardware acceleration schemes, taking into account key factors such as quantization and runtime. Specifically, it compares the hierarchical differences between convolutional neural networks (CNN), Swin - Transformer, and Transformer with 4D encoder. 5. **Long - Term Trends and Challenges**: The paper also emphasizes the challenges, trends, and current research insights faced by the Transformer model in hardware deployment and acceleration, especially specific problems in long - term application in autonomous driving. In summary, the goal of this paper is to provide a comprehensive and in - depth overview of the application of the Transformer model in the field of autonomous driving, with an emphasis on model structure, operational - level optimization, and hardware acceleration techniques, in order to promote its practical deployment. ### Formula Examples Some formulas mentioned in the paper can be presented in Markdown format as follows: - **Attention Mechanism Formula**: \[ \text{Attention}(Q, K, V)=\text{softmax}\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V \] where \(Q\) is the query matrix, \(K\) is the key matrix, \(V\) is the value matrix, and \(d_{k}\) is the dimension of the key. - **Multi - Head Attention Mechanism**: \[ \text{MultiHead}(Q, K, V)=\text{Concat}(\text{head}_{1},\text{head}_{2},\dots,\text{head}_{h})W^{O} \] where each \(\text{head}_{i}\) is calculated as: \[ \text{head}_{i}=\text{Attention}(QW_{i}^{Q}, KW_{i}^{K}, VW_{i}^{V}) \] These formulas are used to explain the working principle of the attention mechanism and are the core part of the Transformer architecture.