Adaptive subband partition encoding scheme for multiple audio objects using CNN and residual dense blocks mixture network

Yulin Wu,Ruimin Hu,Xiaochen Wang
DOI: https://doi.org/10.1016/j.eswa.2024.123323
IF: 8.5
2024-01-29
Expert Systems with Applications
Abstract:As the demand for user immersion and interaction in multimedia entertainment systems increases, spatial audio is widely used because it enables flexible control of audio objects. Spatial audio object coding (SAOC) is a technique for transmitting multiple audio objects as a compact format. However, the quality of audio objects without aliasing distortion at low bitrates remains a highly challenging task to date. We propose a highly efficient spatial audio coding system consisting of an adaptive subband partition strategy and convolutional neural networks (CNN) with residual dense blocks. The proposed method updates side information to indicate the activity of each audio object for each time–frequency bin, enabling scalable transmission while maintaining the perceptual quality of each audio object. It has three main advantages: (1) nonuniform frequency perceptual scale is used for frequency resolution with fully considering active frequency band characteristics of audio objects; (2) the adaptive subband partition is adopted to update the side information at each time–frequency bin based on objects' energy; (3) the CNN and residual mixture network is designed for multi-level compression strategy according to time, frequency, and object axis. Systematic evaluations and comparisons demonstrate that the proposed method outperforms baselines in the decoded object quality.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?