Abstract:This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which couples the individual instrument networks, and (iii) combination loss (CL). MDL enables the taking advantage of the frequency- and time-domain representations of audio signals. We modify the target network, i.e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information. MDL is then applied to the combinations of the output sources as well as each independent source; hence, we called it CL. MDL and CL can easily be applied to many DNN-based separation methods as they are merely loss functions that are only used during training and do not affect the inference step. Bridging operation does not increase the number of learnable parameters in the network. Experimental results showed that the validity of Open-Unmix (UMX), densely connected dilated DenseNet (D3Net) and convolutional time-domain audio separation network (Conv-TasNet) extended with our X-scheme, respectively called X-UMX, X-D3Net and X-Conv-TasNet, by comparing them with their original versions. We also verified the effectiveness of X-scheme in a large-scale data regime, showing its generality with respect to data size. X-UMX Large (X-UMXL), which was trained on large-scale internal data and used in our experiments, is newly available at <a class="link-external link-https" href="https://github.com/asteroid-team/asteroid/tree/master/egs/musdb18/X-UMX" rel="external noopener nofollow">this https URL</a>.

An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation

Multi-channel U-Net for Music Source Separation

Pre-training Music Classification Models via Music Source Separation

Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform

Music Source Separation in the Waveform Domain

Music Source Separation Based on a Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET)

Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation

U-NET: A Supervised Approach for Monaural Source Separation

Music Source Separation Using Stacked Hourglass Networks

The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Network

The whole is greater than the sum of its parts: improving music source separation by bridging networks

Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Multichannel Blind Music Source Separation Using Directivity-Aware MNMF With Harmonicity Constraints

D3Net: Densely connected multidilated DenseNet for music source separation

SCNet: Sparse Compression Network for Music Source Separation

Audio query-based music source separation

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

Blind Source Separation Based on Improved Wave-U-Net Network