Abstract:This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is composed of an encoder and a decoder with the modified ConvNeXt v2 network as the backbone, connected by a quantizer based on the residual vector quantization (RVQ) mechanism. The encoder compresses the audio amplitude and phase spectra in parallel, amalgamating them into a continuous latent code at a reduced temporal resolution. This code is subsequently quantized by the quantizer. Ultimately, the decoder reconstructs the audio amplitude and phase spectra in parallel, and the decoded waveform is obtained by inverse short-time Fourier transform. To ensure the fidelity of decoded audio like waveform codecs, spectral-level loss, quantization loss, and generative adversarial network (GAN) based loss are collectively employed for training the APCodec. To support low-latency streamable inference, we employ feed-forward layers and causal deconvolutional layers in APCodec, incorporating a knowledge distillation training strategy to enhance the quality of decoded audio. Experimental results confirm that our proposed APCodec can encode 48 kHz audio at bitrate of just 6 kbps, with no significant degradation in the quality of the decoded audio. At the same bitrate, our proposed APCodec also demonstrates superior decoded audio quality and faster generation speed compared to well-known codecs, such as Encodec, AudioDec and DAC.

Neural Audio Coding with Deep Complex Networks

A High Fidelity and Low Complexity Neural Audio Coding

Audio-Visual Speech Enhancement with Deep Multi-modality Fusion

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

SpatialCodec: Neural Spatial Speech Coding

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm

Adaptive Modulation and Coding for Underwater Acoustic Communication Based on Neural Networks

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

SNAC: Multi-Scale Neural Audio Codec

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Neural Video Compression with Feature Modulation

Towards audio language modeling -- an overview

DeepCoder: A Deep Neural Network Based Video Compression

Deep Complex Networks

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

HILCodec: High-Fidelity and Lightweight Neural Audio Codec

APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding