Abstract:The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bjøntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2K logic gates when working at 471.2MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.

Implementation of AVS Jizhun Decoder with HW/SW Partitioning on a Coarse-Grained Reconfigurable Multimedia System.

Implementation of Multi-Standard Video Decoder on a Heterogeneous Coarse-Grained Reconfigurable Processor

H.264 Parallel Decoder at HD Resolution on a Coarse-Grained Reconfigurable Multi-Media System

Parallelization Of Computing-Intensive Tasks Of The H.264 High Profile Decoding Algorithm On A Reconfigurable Multimedia System

H.264/AVC baseline profile decoder optimization on independent platform

An AVS HDTV Video Decoder Architecture Employing Efficient HW/SW Partitioning.

Parallel Implementation of Computing-Intensive Decoding Algorithms of H.264 on Reconfigurable SoC

An AVS Video Decoder Design and Implementation Based on Parallel Algorithm

A Parallel Serial Filtering Mixed Advanced ID Interpolation Architecture for AVS

Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications

Pipelined Architecture Design of H.264/AVC CABAC Real-Time Decoding

An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.

A Real-Time Ultra-High Definition Video Decoder of AVS3 on Heterogeneous Systems

Parallelized RDOQ Algorithm and Fully Pipelined Hardware Architecture for AVS3 Video Coding

H.264/AVC Intra Predictor on a Coarse-Grained Reconfigurable Multi-Media System

Mapping of computing-intensive tasks in H.264 decoding to a reconfigurable processor

An Efficient Hardware Implementation for Intra Prediction of Avs Encoder

A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (abstract Only)

Reconfigurable Video Coding Framework and Decoder Reconfiguration Instantiation of AVS.

A High-Performance VLSI Architecture for CABAC Decoding in H.264/AVC

uAVS3d - Fast Decoder for the 3rd Generation Audio Video Coding Standard (AVS3).