A Low-Bitrate Neural Audio Codec Framework with Bandwidth Reduction and Recovery for High-Sampling-Rate Waveforms

Yang Ai,Ye-Xin Lu,Xiao-Hang Jiang,Zheng-Yan Sheng,Rui-Chen Zheng,Zhen-Hua Ling
DOI: https://doi.org/10.21437/interspeech.2024-108
2024-01-01
Abstract:This paper proposes a novel neural audio codec framework which incorporates bandwidth reduction and recovery, facilitating its application in scenarios with high sampling rates and low bitrates. The proposed framework consists of a two-stage-downsampling-based encoder, a quantizer, and a two-stage-upsampling-based decoder. The encoder initially reduces the bandwidth of the high-sampling-rate waveform before encoding it. Therefore, the discrete tokens outputted by the quantizer are derived from the low-sampling-rate waveform, resulting in a low bitrate. The decoder decodes the low-sampling-rate waveform and ultimately recovers the original high-sampling-rate waveform by bandwidth recovery. Experiments confirm that our proposed framework achieves high-quality audio coding at a sampling rate of 48 kHz and a bitrate of only 1 kbps. The bitrate savings amount to 6 times compared to baseline codecs without bandwidth reduction and recovery.
What problem does this paper attempt to address?