Abstract:Reverberation is a key element in spatial audio perception, historically achieved with the use of analogue devices, such as plate and spring reverb, and in the last decades with digital signal processing techniques that have allowed different approaches for Virtual Analogue Modelling (VAM). The electromechanical functioning of the spring reverb makes it a nonlinear system that is difficult to fully emulate in the digital domain with white-box modelling techniques. In this study, we compare five different neural network architectures, including convolutional and recurrent models, to assess their effectiveness in replicating the characteristics of this audio effect. The evaluation is conducted on two datasets at sampling rates of 16 kHz and 48 kHz. This paper specifically focuses on neural audio architectures that offer parametric control, aiming to advance the boundaries of current black-box modelling techniques in the domain of spring reverberation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively replicate the characteristics of spring reverb using different neural network architectures, especially to simulate the complex behavior of such a nonlinear system in the digital domain. Specifically, the research aims to evaluate the effectiveness of five different neural network architectures (including convolutional and recurrent models) in capturing the unique acoustic characteristics of spring reverb, and conduct a systematic comparison through two datasets with different sampling rates (16 kHz and 48 kHz). ### Research Background Spring reverb is an important element of spatial audio perception, and it has been traditionally achieved through analog devices (such as plate reverb and spring reverb). In recent years, with the development of digital signal processing technology, virtual analogue modelling (VAM) has become a new method. However, due to the electromechanical working principle of spring reverb, which makes it a nonlinear system, traditional white - box modelling techniques are difficult to fully and accurately simulate its characteristics in the digital domain. ### Research Objectives This paper focuses particularly on neural audio architectures with parameter control, aiming to push the boundaries of current black - box modelling techniques in the field of spring reverb. By comparing the performance of different neural network architectures, the research hopes to find the best model that can achieve real - time processing in high - fidelity audio applications. ### Main Contributions 1. **Model Comparison**: Evaluated the capabilities of five different neural network architectures (TCN, WaveNet, GCN, LSTM, and GRU) in replicating the characteristics of spring reverb. 2. **Dataset Usage**: Used two public datasets (SpringSet and EGFxSet) to conduct experiments at sampling rates of 16 kHz and 48 kHz respectively. 3. **Performance Evaluation**: Evaluated model performance through quantitative indicators (such as ESR, MRSTFT, and RTF) to ensure the reproducibility and transparency of the results. ### Conclusions The research shows that the WaveNet model performs excellently at a sampling rate of 16 kHz and can well capture the subtle features of spring reverb; while the GCN model performs best at a sampling rate of 48 kHz, not only outperforming other models in the MRSTFT indicator, but also showing advantages in real - time processing capabilities. This provides strong support for real - time modelling of high - fidelity audio effects in the future. ### Formula Summary - **Total Loss Function**: \[ L = L_{\text{SmoothL1}}+L_{\text{STFT}} \] - **ESR Calculation Formula**: \[ L_{\text{ESR}}=\frac{\sum_{i = 0}^{N - 1}|y_i-\hat{y}_i|^2}{\sum_{i = 0}^{N - 1}|y_i|^2} \] - **Multi - Resolution STFT Loss Function**: \[ L_{\text{MRSTFT}}(\hat{y},y)=\sum_{m = 1}^{M}(l_m^{\text{SC}}(\hat{y},y)+\alpha l_m^{\text{SM}}(\hat{y},y)) \] where \(M\) is the total number of resolutions, \(\alpha\) is the weight factor of the log - magnitude loss, and \(|y|\) and \(|\hat{y}|\) represent the magnitudes of the true value and the predicted value respectively. Through these formulas and experimental results, the paper provides valuable insights into the application of neural networks in audio effect modelling.

Evaluating Neural Networks Architectures for Spring Reverb Modelling

Modeling plate and spring reverberation using a DSP-informed deep neural network

Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling

Validation Of Acoustic Models Of Auditory Neural Prostheses

Reverberation Modeling for Source-Filter-Based Neural Vocoder.

Neural modeling of magnetic tape recorders

Room Acoustic Rendering Networks with Control of Scattering and Early Reflections

Toward a Better Understanding of Deep Neural Network Based Acoustic Modelling: An Empirical Investigation

Reverb Conversion of Mixed Vocal Tracks Using an End-to-end Convolutional Deep Neural Network

Fitting Auditory Filterbanks with Multiresolution Neural Networks

Echo-aware room impulse response generation

Non-Exponential Reverberation Modeling Using Dark Velvet Noise

Predicting reflection patterns from binaural activity maps using deep neural networks

Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

Sensory and perceptual decisional processes underlying the perception of reverberant auditory environments

Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks

Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments

End-to-End Classification of Reverberant Rooms using DNNs

Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models