A differentiable short-time Fourier transform with respect to the window length

Maxime Leiber,Axel Barrau,Yosra Marnissi,Dany Abboud
DOI: https://doi.org/10.48550/arXiv.2208.10886
2022-08-25
Abstract:In this paper, we revisit the use of spectrograms in neural networks, by making the window length a continuous parameter optimizable by gradient descent instead of an empirically tuned integer-valued hyperparameter. The contribution is mostly theoretical at this point, but plugging the modified STFT into any existing neural network is straightforward. We first define a differentiable version of the STFT in the case where local bins centers are fixed and independent of the window length parameter. We then discuss the more difficult case where the window length affects the position and number of bins. We illustrate the benefits of this new tool on an estimation and a classification problems, showing it can be of interest not only to neural networks but to any STFT-based signal processing algorithm.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using spectrograms generated by the short - time Fourier transform (STFT) in neural networks, the window length is usually set as a fixed parameter through trial - and - error or the default value of the signal processing library, without in - depth research or reasonable explanation. However, the window length determines the trade - off between the time and frequency resolutions of the spectrogram, so it needs to be set carefully. Specifically, the paper proposes a new paradigm that makes the window length of STFT a continuously optimizable parameter, which can be optimized online through the gradient descent algorithm. This enables the window length to be jointly optimized like neural network weights, thereby improving model performance. The following are the main contributions of the paper: 1. **Define differentiable STFT**: The paper proposes a differentiable version of STFT, making the window length \( L \) a continuous parameter, and the spectrogram values are differentiable with respect to \( L \). 2. **Distinguish numerical window support and time resolution**: Decompose the window length parameter into two variables: numerical window support \( N \) and time resolution \( \theta \), thereby ensuring the differentiability of STFT. 3. **Theoretical proof and formula derivation**: Provide detailed mathematical proofs and back - propagation formulas, allowing gradient optimization of the window length. 4. **Application verification**: Demonstrate the effectiveness of the new method through frequency tracking and speech recognition tasks, proving its potential in practical applications. In summary, this paper aims to optimize the window length by introducing differentiable STFT, thereby improving the performance of spectrogram - based neural networks and other signal processing algorithms.