VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

Yongyi Miao,Zhongdang Li,Yang Wang,Die Hu,Jun Yan,Youfang Wang
2024-09-05
Abstract:In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which intelligently select key frames to minimize inter-frame redundancy and mitigate the cliff-effect under challenging channel conditions. In the second stage, we propose the semantic vector quantization encoder and decoder, placed respectively at the transmitter and receiver, which efficiently compress key frames using advanced indexing and spatial normalization modules to reduce redundancy. Additionally, we propose adjustable index selection and recovery modules, enhancing compression efficiency and enabling flexible compression ratio adjustment. Compared to the joint source-channel coding (JSCC) framework, the proposed framework exhibits superior compatibility with current digital communication systems. Experimental results demonstrate that VQ-DeepVSC achieves substantial improvements in both Multi-Scale Structural Similarity (MS-SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) metrics than the H.265 standard, particularly under low channel signal-to-noise ratio (SNR) or multi-path channels, highlighting the significantly enhanced transmission capabilities of our approach.
Networking and Internet Architecture
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the contradiction between the rapid growth of global video traffic and the limitations of traditional wireless transmission systems. Specifically, although traditional wireless video transmission systems have made improvements in optimizing the bit error rate (BER), they still face challenges in ensuring high - quality transmission. These systems mainly focus on compression efficiency, but are insufficient in semantic understanding and adaptability required under dynamic network conditions. Even with the latest H.265 technology, the problem of cliff - effect has not been solved. The cliff - effect refers to the significant decline in video transmission quality when the channel signal - to - noise ratio (SNR) is below a certain critical threshold. To solve these problems, the author proposes a new framework based on two - stage vector quantization - VQ - DeepVSC, to enhance the video transmission performance on wireless channels. This framework is achieved through the following two stages: 1. **First stage**: An adaptive key - frame extractor and an interpolator are designed and deployed at the transmitter and the receiver respectively. The goal of this stage is to reduce inter - frame redundancy by intelligently selecting key frames and mitigate the cliff - effect under poor channel conditions. 2. **Second stage**: A semantic vector quantization encoder and a decoder are proposed and placed at the transmitter and the receiver respectively. This stage efficiently compresses key frames through advanced indexing and spatial normalization modules, further reducing redundancy. In addition, the author also introduces an adjustable exponent selection and recovery module, which improves the compression efficiency and allows for flexible adjustment of the compression ratio. Experimental results show that, compared with the H.265 standard, VQ - DeepVSC shows significant improvements in the multi - scale structural similarity (MS - SSIM) and the learned perceptual image patch similarity (LPIPS) metrics, especially under low channel signal - to - noise ratio or multipath channel conditions, demonstrating stronger transmission capabilities.