Fine-grained CUDA-based Parallel Intra Prediction for H.264/AVC.

Wenbin Jiang,Min Long,Hai Jin,Pengcheng Wang
DOI: https://doi.org/10.1145/2597176.2578266
2014-01-01
Abstract:Recently, the power of the Graphics Processing Unit (GPU) has largely increased, whereas previous works of intra prediction on the GPU could not efficiently exploit the massive parallel opportunity. The related work only achieves frame-level, slice-level or block-level parallelism. It is a challenge to implement fine-grained parallelism on the Compute Unified Device Architecture (CUDA), such as pixel-level and mode-level, because the irregular formulas of intra prediction and the constraints posed by H.264/AVC cause significant branch instructions and the CUDA architecture is inherently not good at handling branches. In this paper, a CUDA-based approach that adopts fine-grained parallelism is presented. By transforming the various prediction formulas to the same form and introducing the predictor unit, an algorithm based on a lookup table is proposed to efficiently eliminate the branches. In addition, the combinatorial frame technique and the optimized encoding order are adopted to maximize the parallelism. Experimental results show that significant encoding time reduction can be achieved and the proposed algorithm outperforms previous works.
What problem does this paper attempt to address?