An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation

Thomas Sgouros,Angelos Bousis,Nikolaos Mitianoudis
DOI: https://doi.org/10.1109/access.2022.3221766
IF: 3.9
2022-11-22
IEEE Access
Abstract:The music source separation problem, where the task at hand is to estimate the audio components that are present in a mixture, has been at the centre of research activity for a long time. In more recent frameworks, the problem is tackled by creating deep learning models, which attempt to extract information from each component by using Short-Time Fourier Transform (STFT) spectrograms as input. Most approaches assume that one source is present at each time-frequency point, which allows to allocate this point from the mixture to the desired source. Since this assumption is strong and is reported not to hold in practice, there is a problem that arises from the use of the magnitude of the STFT as input to these networks, which is the absence of the Fourier phase information during the separated source reconstruction. The recovery of the Fourier phase information is neither easily tractable, nor computationally efficient to estimate. In this paper, we propose a novel Attentive MultiResUNet architecture, that uses real-valued Short-Time Discrete Cosine Transform data as inputs. This step avoids the phase recovery problem, by estimating the appropriate values within the network itself, rather than employing complex estimation or post-processing algorithms. The proposed novel network features a U-Net type structure with residual skip connections and an attention mechanism that correlates the skip connection and the decoder output at the previous level. The proposed network is used for the first time in source separation and is more computationally efficient than state-of-the-art separation networks and features favourable performance compared to the state-of-the-art with a fraction of the computational cost.
What problem does this paper attempt to address?