Sparse Spatio-Temporal Representation with Adaptive Regularized Dictionary Learning for Low Bit-Rate Video Coding
Hongkai Xiong,Zhiming Pan,Xinwei Ye,Chang Wen Chen
DOI: https://doi.org/10.1109/tcsvt.2012.2221271
IF: 5.859
2013-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:For promising vision-based video coding on low-quality data, this paper proposes a sparse spatio-temporal representation with adaptive regularized dictionary learning and develops a low bit-rate video coding scheme. In a reversed-complexity Wyner-Ziv coding manner, it selects a subset of key frames to code at original resolution, while the rest are down sampled and reconstructed by a sparse spatio-temporal approximation using key frames as a training dataset. Since primitive patches (geometry) are of low dimensionality and can be well learned from the primitive patches across frames in a scale space, a video frame is divided into three layers: a primitive layer, a nonprimitive coarse layer, and a nonprimitive smooth layer. The multiscale differential feature representations are invertible to facilitate reconstruction with dictionary learning, and the target is formulated as an optimization problem by constructing a sparse representation of 2-D patches and 3-D volumes over adaptive regularized dictionaries, a set of 2-D subdictionary pairs trained from primitive patches, and a 3-D dictionary trained from nonprimitive volumes. Specifically, the nonprimitive layer is constructed as volumes in to order keep it consistent along the motion trajectory, which enables sparse representations over a learned 3-D spatio-temporal dictionary. Through hierarchical bidirectional motion estimation and adaptive overlapped block motion compensation, the 3-D low-frequency and high-frequency dictionary pair is designed by the K-SVD algorithm to update the atoms for optimal sparse representation and convergence. In reconstruction, the lost high-frequency information of the down-sampled frames can be synthesized from the sparse spatio-temporal representation over the adaptive regularized dictionaries. Extensive experiments validate the compression efficiency of the proposed scheme versus H.264/AVC in terms of both objective and subjective comparisons.