Variable-rate Neural Speech Compression with Multi-scale Feature Extraction and Improved Entropy Modeling

Shaohan Sun,Yuzhuo Kong,Tong Chen,Zhan Ma
DOI: https://doi.org/10.1109/dcc58796.2024.00102
2024-01-01
Abstract:Speech coding serves as a means of data compression, aiming to decrease the expenses related to data storage and transmission. The efficacy of compressing speech efficiently through neural networks has been demonstrated in methods using vector quantization (VQ). However, the complex procedure of VQ makes it challenging to fit into frameworks and limits compression at discrete bitrate points. This paper proposes a neural speech compression framework, which achieves flexible bitrate speech reconstruction through compact latent representation and better entropy estimation.
What problem does this paper attempt to address?