Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Guoliang You,Xiaomeng Chu,Yifan Duan,Wenyu Zhang,Xingchen Li,Sha Zhang,Yao Li,Jianmin Ji,Yanyong Zhang
2024-07-16
Abstract:When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of neglecting traffic elements (such as lanes, intersections, traffic rules, and dynamic participants) in autonomous driving planning. Traditional end - to - end planning methods often overlook these crucial traffic elements, resulting in inefficient planning and potential non - compliance with traffic rules. Specifically, the paper proposes a new framework named **Perception Helps Planning (PHP)** to combine perception and planning tasks. Through this combination, it is ensured that the planning process can inherently follow traffic constraints, thus achieving safer and more efficient driving. The PHP framework pays special attention to the two edges of the lane (i.e., the left and right lane lines) and takes into account lane intersections, lane directions, lane occupancy, and planning information. #### Summary of main problems: 1. **Limitations of traditional end - to - end planning methods**: These methods usually ignore traffic elements such as lanes and intersections, resulting in less precise and safe planning. 2. **How to efficiently integrate perception and planning**: Existing methods either completely separate perception and planning or are simply sequentially integrated, lacking in - depth interaction between the two. 3. **Improving the safety and efficiency of planning**: By introducing the double - edge structure and the Transformer model, the PHP framework can perform more refined perception and planning at the lane level, thereby enhancing overall performance. ### Overview of solutions: - **Double - edge data structure**: Used to represent the left and right edges of the lane and their attributes (such as intersections, directions, occupancy, etc.). - **Transformer encoder and decoder**: Used to extract features from multi - camera images and predict lane - level perception results. - **Hierarchical feature early - fusion module**: Enhances the correlation of features through the attention mechanism to better predict planning attributes. - **Goal - guided planning branch**: Uses the cross - attention mechanism to enhance the interaction between target points and features, improving planning precision. - **Double - edge interpreter**: Fuses perception and planning information at the result level to generate vehicle control signals. Through these innovations, the PHP framework significantly outperforms existing algorithms in the Carla benchmark test, increasing the driving scores by 27.20%, 33.47%, and 15.54% respectively in three benchmark tests, and achieving a processing speed of 22.57 frames per second.