Abstract:Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. To reduce computational complexity, a novel surface normal estimator is proposed to establish spatial correspondences between the sampled frames, allowing the HomoFusion module to perform a pixel-to-pixel attention mechanism in updating the representation of the occluded road lines or markings. Experiments on ApolloScape, a large-scale lane mark segmentation dataset, and ApolloScape Night with artificial simulated night-time road conditions, demonstrate that our method outperforms other existing SOTA lane mark segmentation models with less than 9\% of their parameters and computational complexity. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy. We also prove the versatility of our HomoFusion approach by applying it to the problem of water puddle segmentation and achieving SOTA performance.

What problem does this paper attempt to address?

The paper mainly addresses the problem of lane line and marking segmentation in autonomous driving scenarios, proposing a new method to improve segmentation accuracy, particularly solving the common issue of partial occlusion of lane lines and markings in real driving environments. The key contributions proposed in the paper include: 1. **Homography Guided Fusion (HomoFusion) Module**: This module leverages the complementary information between adjacent video frames to restore occluded lane lines or markings by estimating a homography transformation matrix, thereby improving classification accuracy. Specifically, it updates the representation of occluded areas by establishing spatial correspondences between pixels in different frames to achieve better classification results. 2. **Road Surface Normal Estimator (RSNE)**: This is a novel method for estimating road surface normals. By combining the intrinsic and extrinsic parameters of the camera mounted on the vehicle, an accurate homography transformation matrix can be obtained, thereby establishing spatial correspondences between different frames. RSNE simplifies the homography transformation problem, reducing it from an 8 degrees of freedom (DoF) problem to a normal vector problem with only 2 degrees of freedom. 3. **Lightweight Lane Marking Segmentation Model**: Compared to existing technologies, this model significantly reduces model complexity and computational requirements while ensuring performance. Specifically, the proposed model achieves better performance than the current state-of-the-art (SOTA) methods using less than 9% of the parameters and computational load. Through experimental validation, the method demonstrates superior performance on the ApolloScape dataset and the ApolloScape Night dataset, which simulates nighttime conditions, particularly in handling challenges such as occlusion, shadows, and reflections. Additionally, the authors discuss in detail the selection of key hyperparameters in the model and further validate the effectiveness of the proposed method through ablation studies.

Homography Guided Temporal Fusion for Road Line and Marking Segmentation

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

Lane and Road Marker Semantic Video Segmentation Using Mask Cropping and Optical Flow Estimation

Road Segmentation with Image-LiDAR Data Fusion

FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks

Road-aware Monocular Structure from Motion and Homography Estimation

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection

Deep Features Homography Transformation Fusion Network-A Universal Foreground Segmentation Algorithm for PTZ Cameras and a Comparative Study

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

MicroRNA expression analysis and Multiplex ligation‐dependent probe amplification in metastatic and non‐metastatic uveal melanoma

SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation

A Multi-phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation with Weak Supervision

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

Containing cefoxitin costs through a program to curtail use in surgical prophylaxis.

A Novel Multimodal Fusion Network Based on a Joint Coding Model for Lane Line Segmentation

Closing the Calibration Gap: A Real-Time Multi-Modal Fusion Framework for 3D Semantic Segmentation

MAFNet: Segmentation of Road Potholes With Multimodal Attention Fusion Network for Autonomous Vehicles