Improving Drum Source Separation with Temporal-Frequency Statistical Descriptors

Si Li,Jiaxing Liu,Peilin Li,Dichucheng Li,Xinlu Liu,Yongwei Gao,Wei Li
DOI: https://doi.org/10.1109/icme57554.2024.10688211
2024-01-01
Abstract:Drum Source Separation (DSS) aims to separate drum mixtures into individual drum sounds, such as kick and snare. Deep neural network methods have been successfully applied for source separation. However, due to the limited size of existing datasets and the strongly overlap of drums in frequency and time, these methods still have certain shortcomings. To address these challenges, we construct a large drum sound dataset and propose a novel training objective to improve performance of DSS task. The training objective leverages three temporal-frequency statistical descriptors (spectral centroid, spectral spread, and spectral flux) to separate drum sources. Our experimental results demonstrate that our method can make a SDR improvement of 0.98 dB on UNet and 1.07 dB on MERT. Furthermore, our method achieves consistent improvements in low-resource and cross-dataset scenarios. Our code and dataset are available at https://github.com/150042/Drum-Separation-TF.
What problem does this paper attempt to address?