Semi-global stereo matching algorithm based on feature fusion and its CUDA implementation
Niqi Lyu,Guanghua Song,Bowei Yang
DOI: https://doi.org/10.11834/jig.170157
2018-01-01
Journal of Image and Graphics
Abstract:Objective In unmanned aerial vehicle systems,estimation of scene information in real time is a key issue in conducting automatic obstacle avoidance and navigation.A binocular stereo vision system is an effective means to obtain scene information;this system simulates the working principle of the human eyes by using two cameras to capture the same sense at the same time and generates a disparity map by using a stereo matching algorithm.In this work,we propose ADCC-TSGM,a novel texture-optimized semi-global stereo matching algorithm based on the fusion of absolute difference (AD) feature and center average census feature.Efforts are made to speed up the algorithm through CUDA parallel acceleration.Method First,a one-dimensional difference method is used to calculate the texture information along the epipolar line,the center average census feature and AD feature are exploited to conduct the cost computation,and the global stereo matching algorithm is texture-optimized to aggregate the cost and obtain the initial disparity.Second,left-right consistency check is used to detect unstable pixels and occlusion pixels,and linear interpolation and median filter method are used to fill the holes of the disparity map.Lastly,to improve the running speed,we optimize the code of GPU acceleration for each step of the stereo matching.The time consumption of memory access is considered in the feature calculation of types,such that center average census is much higher than that of computation,and a large number of data-intensive computing tasks are conducted between adjacent threads.Consequently,we divide the dataset of the entire thread block into four regions,copy them into a shared memory,and use the shared memory for computation to reduce the overhead of memory accessing.A single thread can simultaneously handle two consecutive disparity calculations by using SIMD instructions.When the GPU is processing,the CPU is basically idle.Therefore,a hybrid pipeline is designed to fully utilize the computing resources of the embedded platform.Result To demonstrate the effectiveness of the proposed algorithm,we use NVIDIA Jetson TK1 developer kit,which has a quad-core ARM Cortex-A15 CPU,a Kepler GPU with 192 CUDA cores,and 2 GB memory,as the embedded computing platform to conduct experiments on Middlebury stereo datasets that have been resized to QVGA resolution.With the actual application scenarios and resolution of images,the maximum disparity for each algorithm is set to 64,and the block matching window size of SGBM and BM is set to 9 × 9.The texture penalty coefficients ε1 and ε2 in the proposed algorithm are set to 0.25 and 0.125,respectively.Experimental results show that the total bad-pixel rate and the average error rate of the proposed algorithm are significantly lower than those of BM,SGBM,and SGM,respectively.The total bad-pixel rate of the ADCC-TSGM algorithm is 73.9% lower than that of BM algorithm,36.1% lower than that of SGBM algorithm,and 28.3% lower than that of SGM algorithm.The average error rate of the proposed algorithm is 83.2% lower than that of the BM algorithm,44.5% lower than that of the SGBM algorithm,and 49.9% lower than that of the SGM algorithm.In particular,the use of center average census in feature matching can reduce the bad-pixel and error rates.The texture-based optimization can adaptively increase the penalty coefficient in low-texture regions and reduce the average error rate from 6.62 to 4.84.The post-processing method,including disparity consistency check and hole filling,can reduce the total bad-pixel rate from 14.46 to 7.12.Through GPU parallel acceleration,the CUDA implementation of the proposed algorithm becomes hundreds of times faster than that of pure CPU implementation without any loss in the quality of disparity map.Compared with SGBM,which has been optimized by using SIMD and multi-core parallel method,our proposed algorithm has a running time that is reduced by 85%.For QVGA resolution,the frame processing rate is as high as 31.8 FPS.Conclusion The proposed algorithm outperforms existing algorithms,such as BM,SGM,and SGBM,which have been used in industries.The CUDA-accelerated implementation of the proposed algorithm provides an effective and feasible method to obtain high-quality disparity information and can be used as a basic means of environmental perception,visual positioning,and map construction for real-time embedded applications,such as micro-aircraft systems.