Diffusion models for audio semantic communication

Eleonora Grassucci,Christian Marinoni,Andrea Rodriguez,Danilo Comminiello
2023-09-13
Abstract:Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this paper, we propose a generative audio semantic communication framework that faces the communication problem as an inverse problem, therefore being robust to different corruptions. Our method transmits lower-dimensional representations of the audio signal and of the associated semantics to the receiver, which generates the corresponding signal with a particular focus on its meaning (i.e., the semantics) thanks to the conditional diffusion model at its core. During the generation process, the diffusion model restores the received information from multiple degradations at the same time including corruption noise and missing parts caused by the transmission over the noisy channel. We show that our framework outperforms competitors in a real-world scenario and with different channel conditions. Visit the project page to listen to samples and access the code: <a class="link-external link-https" href="https://ispamm.github.io/diffusion-audio-semantic-communication/" rel="external noopener nofollow">this https URL</a>.
Sound,Emerging Technologies,Audio and Speech Processing
What problem does this paper attempt to address?
This paper aims to solve the problems of high bandwidth occupation and high error - prone rate in recovering the original signal when audio signals are transmitted through noisy channels. Specifically, directly transmitting audio signals may consume a large amount of bandwidth, and errors are likely to occur when attempting to recover the transmitted bit - stream at the receiving end. To overcome these problems, the author proposes an audio - semantic communication framework based on the generative model. This framework can transmit the low - dimensional representation of the audio signal and its related semantic information, so as to generate content semantically consistent with the original signal at the receiving end without the need to accurately recover the bit - stream. ### Main Problems and Solutions 1. **Problem Description**: - **Bandwidth Occupation**: Directly sending audio signals will occupy a large amount of bandwidth. - **Recovery Error**: When transmitting audio signals in noisy channels, it is easy to make errors in recovering the original bit - stream at the receiving end. 2. **Solutions**: - **Generative Model**: Use the generative model (especially the diffusion model) to solve the communication problem, regarding it as an inverse problem. - **Low - Dimensional Representation**: Send the low - dimensional representation of the audio signal and its semantic information instead of the complete audio signal. - **Semantic Consistency**: At the receiving end, use the conditional diffusion model to generate an audio signal that is semantically consistent, and can recover the audio content even when the signal is affected by noise and partially lost. ### Technical Details - **Problem Modeling**: Model the audio communication problem as an inverse problem, that is, recovering the original audio or its semantic aspects from noisy and partially lost signals. - **Diffusion Model**: Use the diffusion model to handle the inverse problem. The diffusion model destroys the data distribution by gradually adding noise, and then gradually removes the noise in the reverse process to recover the original signal. - **Range - Nullspace Decomposition**: Utilize the range - nullspace decomposition technique to ensure that the generated content conforms to both the constraints of the inverse problem and the data distribution. ### Experimental Results - **Denoising**: Under different signal - to - noise ratio (PSNR) conditions, this framework performs excellently in the denoising task, especially under low PSNR conditions. - **Repair**: In the case of partial loss of audio segments, this framework can generate semantically consistent audio content, especially performing better when the channel conditions are poor. ### Summary This paper proposes an innovative generative audio - semantic communication framework. By transmitting low - dimensional representations and using the diffusion model to generate semantically consistent audio content at the receiving end, it effectively solves the problems of bandwidth occupation and recovery errors in traditional audio communication. Experimental results show that this framework has significant advantages in practical scenarios.