Srcodec: Split-Residual Vector Quantization for Neural Speech Codec.

Youqiang Zheng,Weiping Tu,Li Xiao,Xinmeng Xu
DOI: https://doi.org/10.1109/ICASSP48485.2024.10445966
2024-01-01
Abstract:End-to-end neural speech coding achieves state-of-the-art performance by using residual vector quantization. However, it is a challenge to quantize the latent variables with as few bits as possible. In this paper, we propose SRCodec, a neural speech codec that relies on a fully convolutional encoder/decoder network with specifically proposed split-residual vector quantization. In particular, it divides the latent representation into two parts with the same dimensions. We utilize two different quantizers to quantize the low-dimensional features and the residual between the low- and high-dimensional features. Meanwhile, we propose a dual attention module in split-residual vector quantization to improve information sharing along both dimensions. Both subjective and objective evaluations demonstrate that the effectiveness of our proposed method can achieve a higher quality of reconstructed speech at 0.95 kbps than Lyra-v1 at 3 kbps and Encodec at 3 kbps.
What problem does this paper attempt to address?