LinkOcc: 3D Semantic Occupancy Prediction with Temporal Association

Wenzhe Ouyang,Zenglin Xu,Bin Shen,Jinghua Wang,Yong Xu
DOI: https://doi.org/10.1109/tcsvt.2024.3486019
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:3D semantic occupancy has garnered considerable attention due to its abundant structural information encompassing the entire autonomous driving scene. However, existing 3D occupancy prediction methods are typically tailored for single-frame inputs, resulting in unsatisfactory performance and temporal inconsistencies in real-world continuous scenarios. In this paper, we introduce LinkOcc, a sparse-queries approach incorporating an efficient temporal association mechanism for 3D semantic occupancy prediction. LinkOcc is conceptually built on the prevalent DETR-like framework for 2D segmentation, and we further construct the temporal association mechanism on this basis. Specifically, we propose a near-online training strategy that jointly trains with two adjacent frames, which successfully combines the benefits of both online and off-online methods. Moreover, we introduce a temporal association strategy with contrastive learning to discriminate features for cross-frame semantic-level association. Comprehensive experiments demonstrate that LinkOcc not only surpasses the state-of-the-art methods in 3D occupancy prediction, but also guarantees a promising performance on foreground classes.
What problem does this paper attempt to address?