Abstract:Objective In unmanned aerial vehicle systems,estimation of scene information in real time is a key issue in conducting automatic obstacle avoidance and navigation.A binocular stereo vision system is an effective means to obtain scene information;this system simulates the working principle of the human eyes by using two cameras to capture the same sense at the same time and generates a disparity map by using a stereo matching algorithm.In this work,we propose ADCC-TSGM,a novel texture-optimized semi-global stereo matching algorithm based on the fusion of absolute difference (AD) feature and center average census feature.Efforts are made to speed up the algorithm through CUDA parallel acceleration.Method First,a one-dimensional difference method is used to calculate the texture information along the epipolar line,the center average census feature and AD feature are exploited to conduct the cost computation,and the global stereo matching algorithm is texture-optimized to aggregate the cost and obtain the initial disparity.Second,left-right consistency check is used to detect unstable pixels and occlusion pixels,and linear interpolation and median filter method are used to fill the holes of the disparity map.Lastly,to improve the running speed,we optimize the code of GPU acceleration for each step of the stereo matching.The time consumption of memory access is considered in the feature calculation of types,such that center average census is much higher than that of computation,and a large number of data-intensive computing tasks are conducted between adjacent threads.Consequently,we divide the dataset of the entire thread block into four regions,copy them into a shared memory,and use the shared memory for computation to reduce the overhead of memory accessing.A single thread can simultaneously handle two consecutive disparity calculations by using SIMD instructions.When the GPU is processing,the CPU is basically idle.Therefore,a hybrid pipeline is designed to fully utilize the computing resources of the embedded platform.Result To demonstrate the effectiveness of the proposed algorithm,we use NVIDIA Jetson TK1 developer kit,which has a quad-core ARM Cortex-A15 CPU,a Kepler GPU with 192 CUDA cores,and 2 GB memory,as the embedded computing platform to conduct experiments on Middlebury stereo datasets that have been resized to QVGA resolution.With the actual application scenarios and resolution of images,the maximum disparity for each algorithm is set to 64,and the block matching window size of SGBM and BM is set to 9 × 9.The texture penalty coefficients ε1 and ε2 in the proposed algorithm are set to 0.25 and 0.125,respectively.Experimental results show that the total bad-pixel rate and the average error rate of the proposed algorithm are significantly lower than those of BM,SGBM,and SGM,respectively.The total bad-pixel rate of the ADCC-TSGM algorithm is 73.9％ lower than that of BM algorithm,36.1％ lower than that of SGBM algorithm,and 28.3％ lower than that of SGM algorithm.The average error rate of the proposed algorithm is 83.2％ lower than that of the BM algorithm,44.5％ lower than that of the SGBM algorithm,and 49.9％ lower than that of the SGM algorithm.In particular,the use of center average census in feature matching can reduce the bad-pixel and error rates.The texture-based optimization can adaptively increase the penalty coefficient in low-texture regions and reduce the average error rate from 6.62 to 4.84.The post-processing method,including disparity consistency check and hole filling,can reduce the total bad-pixel rate from 14.46 to 7.12.Through GPU parallel acceleration,the CUDA implementation of the proposed algorithm becomes hundreds of times faster than that of pure CPU implementation without any loss in the quality of disparity map.Compared with SGBM,which has been optimized by using SIMD and multi-core parallel method,our proposed algorithm has a running time that is reduced by 85％.For QVGA resolution,the frame processing rate is as high as 31.8 FPS.Conclusion The proposed algorithm outperforms existing algorithms,such as BM,SGM,and SGBM,which have been used in industries.The CUDA-accelerated implementation of the proposed algorithm provides an effective and feasible method to obtain high-quality disparity information and can be used as a basic means of environmental perception,visual positioning,and map construction for real-time embedded applications,such as micro-aircraft systems.

ReS2tAC-UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices

FPGA Implementation of Non-Parametric Stereo Matching Algorithm

An Embedded-GPU-Based Scheme for Real-Time Imaging Processing of Unmanned Aerial Vehicle Borne Video Synthetic Aperture Radar

TinyStereo: A Tiny Coarse-to-Fine Framework for Vision-Based Depth Estimation on Embedded GPUs

Distributed Real-Time Image Processing of Formation Flying SAR Based on Embedded GPUs

High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

Efficient stereo matching on embedded GPUs with zero-means cross correlation

REAL-TIME ON-BOARD OBSTACLE AVOIDANCE FOR UAVS BASED ON EMBEDDED STEREO VISION

StereoVAE: A Lightweight Stereo-Matching System Using Embedded GPUs.

Video-Rate Stereo Matching Using Markov Random Field TRW-S Inference on a Hybrid CPU+FPGA Computing Platform

Stereo Matching Accelerator With Re-Computation Scheme and Data-Reused Pipeline for Autonomous Vehicles

A Study on Low-Cost, High-Accuracy, and Real-Time Stereo Vision Algorithms for UAV Power Line Inspection

Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices

Real-Time Dense Stereo Embedded in a UAV for Road Inspection

Fast Connected Components Object Segmentation on Fused Lidar and Stereo-Camera Point Clouds with Visual-Inertial-Gimbal for Mobile Applications Utilizing GPU Acceleration

Real-time Monocular Depth Estimation on Embedded Systems

Semi-global stereo matching algorithm based on feature fusion and its CUDA implementation

Realization of CUDA-based Real-Time Multi-Camera Visual SLAM in Embedded Systems

Efficient Visual Odometry and Mapping for Unmanned Aerial Vehicle Using ARM-based Stereo Vision Pre-Processing System

A fast stereo matching algorithm suitable for embedded real-time systems