Abstract:Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (<a class="link-external link-http" href="http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/" rel="external noopener nofollow">this http URL</a>)
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to achieve Blind Audio Bandwidth Extension (BABE) of audio signals without knowing the details of low - pass degradation, especially the challenges faced in restoring historical audio recordings**.
Specifically, audio bandwidth extension refers to reconstructing high - frequency spectral information from band - limited observations. When the specific parameters of low - pass degradation are unknown (for example, when restoring historical audio recordings), this problem becomes more complex and difficult. To solve this blind bandwidth - extension problem, the paper proposes a new method - **BABE**, which can work in a zero - sample setting and utilize the generative prior of a pre - trained unconditional diffusion model to achieve this goal.
### Key Problems and Challenges
1. **Blind Bandwidth Extension**: When restoring historical audio recordings, due to technical limitations, the audio bandwidth is usually limited, and the specific low - pass degradation parameters are unknown.
2. **Zero - Sample Setting**: There is no need for additional training for specific tasks, and the problem is directly solved in the inference stage.
3. **Generality**: The method needs to be able to adapt to different types of low - pass filters and have strong generalization ability.
### Solutions
The method proposed in the paper mainly includes the following aspects:
1. **Application of Diffusion Model**: Utilize the generative ability of the diffusion model to gradually reconstruct the high - frequency part of the audio signal through the reverse diffusion process.
2. **Parameterized Low - Pass Filter**: Design a piecewise - linear low - pass filter model, which can capture a wide range of low - pass response characteristics through a small number of optimization parameters.
3. **Joint Posterior Sampling and Filter Inference**: During the diffusion process, simultaneously optimize the audio signal and low - pass filter parameters to ensure that the reconstructed audio signal is consistent with the original recording.
### Experimental Results
The paper verifies the effectiveness of the BABE method through objective and subjective evaluation indicators. The experimental results show that BABE not only outperforms the existing blind - bandwidth - extension baseline methods, but also shows performance comparable to methods with known degradation information when processing synthetic data. In addition, BABE shows strong generalization ability and significant sound - quality improvement effects in enhancing real - historical recordings.
### Summary
This paper aims to solve the difficult problem of blind - bandwidth - extension in historical audio recordings, proposes a zero - sample method based on the diffusion model, which can effectively restore the high - frequency part of the audio signal without relying on specific degradation information, and verifies its superiority and practicality in multiple experiments.