Real-Time Stereo Image Depth Estimation Network with Group-Wise L1 Distance for Edge Devices Towards Autonomous Driving

Bifa Liang,Wei Wei,Jinhao Huang,Cheng Liu,Hong Yang,Ru Yang,Wenli Shang,Jun Li
DOI: https://doi.org/10.1109/tvt.2023.3284011
IF: 6.8
2023-01-01
IEEE Transactions on Vehicular Technology
Abstract:Depth estimation is an essential element to constitute 3D perceiving ability of autonomous vehicles. Real-time inference on power- or memory-constrained devices would expedite the progress of autonomous driving. In this paper, a lightweight stereo matching network is proposed to simultaneously achieve high accuracy and fast inference time. A novel feature extractor using patch embeddings for downsampling together with a peculiar pyramidal strategy is proposed to obtain accurate disparity maps with less computational resources. By constructing the cost volume with the proposed group-wise L1 distance, the measured feature similarity is represented faster and more efficiently compared to the group-wise correlation. A lightweight 3D aggregation network with less 3D convolutions is proposed to further improve the accuracy and inference time. In addition, TensorRT optimizer approach is employed to improve the inference speed of the model. Exhaustive experiments on KITTI 2012 and KITTI 2015 datasets demonstrate that our model has the ability to execute in real-time on resource-constrained devices, achieving higher frame rates than contemporary state-of-the-art networks. Our pipeline can process 1232 × 368 resolution images within the speed range of 33.8-73.5 frames per second on NVIDIA Jetson Nano with TensorRT optimization, while maintaining comparable accuracy. The achieved state of the art trade-off between accuracy and runtime is significant for edge devices in autonomous driving.
telecommunications,engineering, electrical & electronic,transportation science & technology
What problem does this paper attempt to address?