A Parallel H.264 Encoder with CUDA: Mapping and Evaluation

Nan Wu,Mei Wen,Huayou Su,Ju Ren,Chunyuan Zhang
DOI: https://doi.org/10.1109/ICPADS.2012.46
2012-01-01
Abstract:Efficient mapping of a real-time HD video application to graphics hardware is challenging. Developers face the challenges of choosing the right parallelism model, balancing thread's process granularity between massive computing resources on the GPU, and partitioning tasks between the CPU and GPU. The paper illustrated the mapping approaches by a case of HD H.264 encoder based on X264 reference code and then evaluating it on state-of-the-art CPU and GPUs in depth. In the paper, we first split most of the computing task into Single-Instruction Multiple-Thread (SIMT) kernels, which are then chained intocertaininput/output data stream. Then we implementeda completed H.264 encoding on the computer unified device architecture (CUDA) platform. Finally, we present methods for exploiting multi-level parallelism and memory efficiency when mapping H.264 code, which we use to increase the efficiency of the execution on GPUs. Our experimental results show that computation efficiency of GPU and then real-time encoding performance are achieved with CUDA.
What problem does this paper attempt to address?