LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

Keyi Zhou,Li Li,Wengang Zhou,Yonghui Wang,Hao Feng,Houqiang Li
2024-08-25
Abstract:In video lane detection, there are rich temporal contexts among successive frames, which is under-explored in existing lane detectors. In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context. Technically, we develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context, respectively. The accumulative attention module continuously accumulates visual information during the journey of a vehicle, while the adjacent attention module propagates this lane information from the previous frame to the current frame. The two modules are meticulously designed based on the transformer architecture. Finally, these long-short context features are fused with the current frame features to predict the lane lines in the current frame. Extensive quantitative and qualitative experiments are conducted on two prevalent benchmark datasets. The results demonstrate the effectiveness of our method, achieving several new state-of-the-art records. The codes and models are available at <a class="link-external link-https" href="https://github.com/Alex-1337/LaneTCA" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the following problems in video lane detection: 1. **Existing methods fail to fully utilize the temporal context between frames**: In video lane detection, there is rich temporal context information between consecutive frames, but existing lane detectors do not explore it sufficiently. These methods usually only focus on the information of a single - frame image and ignore the correlation between frames. 2. **Complexity and limitations of multi - frame input**: Some existing methods require multiple historical frames as input, which not only increases the computational complexity, but also affects the prediction result of the current frame if the prediction of the historical frames is inaccurate. In addition, a fixed number of frames limits the information reference in the time span. 3. **Effective fusion of long - term and short - term information**: How to effectively combine short - time (adjacent frames) and long - time (accumulated frames) historical information is a challenge. Existing methods do not perform well in this regard. To solve these problems, the author proposes LaneTCA (Temporal Context Aggregation for Video Lane Detection), which abstracts long - term and short - term temporal contexts by introducing the **cumulative attention module** and the **adjacent attention module** respectively, and fuses this information with the current frame features, thereby improving the accuracy of lane line detection. Specifically, the main contributions of LaneTCA include: - Proposing a new framework for effective temporal context aggregation in video lane detection. - Introducing cumulative attention and adjacent attention modules to handle long - distance and short - distance temporal information, exceeding the limit of a fixed number of frames. - Conducting extensive experiments on popular benchmark datasets, and the results show the effectiveness and superior performance of this method. Through these improvements, LaneTCA can detect lane lines more accurately in complex driving scenarios and has better temporal and spatial continuity.