Abstract:In video lane detection, there are rich temporal contexts among successive frames, which is under-explored in existing lane detectors. In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context. Technically, we develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context, respectively. The accumulative attention module continuously accumulates visual information during the journey of a vehicle, while the adjacent attention module propagates this lane information from the previous frame to the current frame. The two modules are meticulously designed based on the transformer architecture. Finally, these long-short context features are fused with the current frame features to predict the lane lines in the current frame. Extensive quantitative and qualitative experiments are conducted on two prevalent benchmark datasets. The results demonstrate the effectiveness of our method, achieving several new state-of-the-art records. The codes and models are available at <a class="link-external link-https" href="https://github.com/Alex-1337/LaneTCA" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

This paper attempts to solve the following problems in video lane detection: 1. **Existing methods fail to fully utilize the temporal context between frames**: In video lane detection, there is rich temporal context information between consecutive frames, but existing lane detectors do not explore it sufficiently. These methods usually only focus on the information of a single - frame image and ignore the correlation between frames. 2. **Complexity and limitations of multi - frame input**: Some existing methods require multiple historical frames as input, which not only increases the computational complexity, but also affects the prediction result of the current frame if the prediction of the historical frames is inaccurate. In addition, a fixed number of frames limits the information reference in the time span. 3. **Effective fusion of long - term and short - term information**: How to effectively combine short - time (adjacent frames) and long - time (accumulated frames) historical information is a challenge. Existing methods do not perform well in this regard. To solve these problems, the author proposes LaneTCA (Temporal Context Aggregation for Video Lane Detection), which abstracts long - term and short - term temporal contexts by introducing the **cumulative attention module** and the **adjacent attention module** respectively, and fuses this information with the current frame features, thereby improving the accuracy of lane line detection. Specifically, the main contributions of LaneTCA include: - Proposing a new framework for effective temporal context aggregation in video lane detection. - Introducing cumulative attention and adjacent attention modules to handle long - distance and short - distance temporal information, exceeding the limit of a fixed number of frames. - Conducting extensive experiments on popular benchmark datasets, and the results show the effectiveness and superior performance of this method. Through these improvements, LaneTCA can detect lane lines more accurately in complex driving scenarios and has better temporal and spatial continuity.

LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

Lane Mark Detection with Pre-Aligned Spatial-Temporal Attention

Bi2Lane: Bi-Directional Temporal Refinement with Bi-Level Feature Aggregation for 3D Lane Detection

3D Lane Detection With Attention in Attention

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

TCLaneNet: Task-Conditioned Lane Detection Network Driven by Vibration Information

Interactive Attention Learning on Detection of Lane and Lane Marking on the Road by Monocular Camera Image

StructLane: Leveraging Structural Relations for Lane Detection

An Efficient Lane Detection Network with Channel-Enhanced Coordinate Attention

Unsupervised Domain Adaptive Lane Detection via Contextual Contrast and Aggregation

LDTR: Transformer-based Lane Detection with Anchor-chain Representation

PriorLane: A Prior Knowledge Enhanced Lane Detection Approach Based on Transformer

CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

Multi-Modal Attention Guided Real-Time Lane Detection

Lane Detection Transformer Based on Multi-frame Horizontal and Vertical Attention and Visual Transformer Module.

Enhanced SCNN-Based Hybrid Spatial-Temporal Lane Detection Model for Intelligent Transportation Systems

LHFFNet: A hybrid feature fusion method for lane detection

LaneFormer: An Efficient Transformer-based Network for Fast Lane Detection

A Hybrid Spatial-temporal Deep Learning Architecture for Lane Detection

Sparse Laneformer

End-to-end Lane Shape Prediction with Transformers