Abstract:Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this paper, we propose a generative audio semantic communication framework that faces the communication problem as an inverse problem, therefore being robust to different corruptions. Our method transmits lower-dimensional representations of the audio signal and of the associated semantics to the receiver, which generates the corresponding signal with a particular focus on its meaning (i.e., the semantics) thanks to the conditional diffusion model at its core. During the generation process, the diffusion model restores the received information from multiple degradations at the same time including corruption noise and missing parts caused by the transmission over the noisy channel. We show that our framework outperforms competitors in a real-world scenario and with different channel conditions. Visit the project page to listen to samples and access the code: <a class="link-external link-https" href="https://ispamm.github.io/diffusion-audio-semantic-communication/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper aims to solve the problems of high bandwidth occupation and high error - prone rate in recovering the original signal when audio signals are transmitted through noisy channels. Specifically, directly transmitting audio signals may consume a large amount of bandwidth, and errors are likely to occur when attempting to recover the transmitted bit - stream at the receiving end. To overcome these problems, the author proposes an audio - semantic communication framework based on the generative model. This framework can transmit the low - dimensional representation of the audio signal and its related semantic information, so as to generate content semantically consistent with the original signal at the receiving end without the need to accurately recover the bit - stream. ### Main Problems and Solutions 1. **Problem Description**: - **Bandwidth Occupation**: Directly sending audio signals will occupy a large amount of bandwidth. - **Recovery Error**: When transmitting audio signals in noisy channels, it is easy to make errors in recovering the original bit - stream at the receiving end. 2. **Solutions**: - **Generative Model**: Use the generative model (especially the diffusion model) to solve the communication problem, regarding it as an inverse problem. - **Low - Dimensional Representation**: Send the low - dimensional representation of the audio signal and its semantic information instead of the complete audio signal. - **Semantic Consistency**: At the receiving end, use the conditional diffusion model to generate an audio signal that is semantically consistent, and can recover the audio content even when the signal is affected by noise and partially lost. ### Technical Details - **Problem Modeling**: Model the audio communication problem as an inverse problem, that is, recovering the original audio or its semantic aspects from noisy and partially lost signals. - **Diffusion Model**: Use the diffusion model to handle the inverse problem. The diffusion model destroys the data distribution by gradually adding noise, and then gradually removes the noise in the reverse process to recover the original signal. - **Range - Nullspace Decomposition**: Utilize the range - nullspace decomposition technique to ensure that the generated content conforms to both the constraints of the inverse problem and the data distribution. ### Experimental Results - **Denoising**: Under different signal - to - noise ratio (PSNR) conditions, this framework performs excellently in the denoising task, especially under low PSNR conditions. - **Repair**: In the case of partial loss of audio segments, this framework can generate semantically consistent audio content, especially performing better when the channel conditions are poor. ### Summary This paper proposes an innovative generative audio - semantic communication framework. By transmitting low - dimensional representations and using the diffusion model to generate semantically consistent audio content at the receiving end, it effectively solves the problems of bandwidth occupation and recovery errors in traditional audio communication. Experimental results show that this framework has significant advantages in practical scenarios.

Diffusion models for audio semantic communication

Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints

Diffusion Models for Audio Restoration

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Lightweight Diffusion Models for Resource-Constrained Semantic Communication

Rethinking Multi-User Semantic Communications with Deep Generative Models

Diffusion Model Based Secure Semantic Communications with Adversarial Purification.

Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises

Semantic Communications Based on Adaptive Generative Models and Information Bottleneck

Diffusion Models for Wireless Communications

Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Diff-GO: Diffusion Goal-Oriented Communications to Achieve Ultra-High Spectrum Efficiency

Asymmetric Diffusion Based Channel-Adaptive Secure Wireless Semantic Communications

DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

Investigating the Design Space of Diffusion Models for Speech Enhancement

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation