Dynamic Slimmable Network for Speech Separation

Mohamed Elminshawi,Srikanth Raj Chetupalli,Emanuël A. P. Habets
DOI: https://doi.org/10.1109/lsp.2024.3445304
2024-09-03
IEEE Signal Processing Letters
Abstract:Neural networks for speech separation generally exhibit high computational costs and large memory footprints. Moreover, typical separation networks have a fixed computational graph that processes all input frames at a uniform computational cost, even though intensive processing may not be necessary for frames containing silence or a single active speaker. Addressing this computational inefficiency is especially crucial when these networks are deployed on resource-constrained devices. In this letter, we propose a dynamic slimmable network for speech separation that mitigates the computational inefficiency of existing networks. We introduce slimmable layers with a gating mechanism that can adapt their computational complexity based on the input characteristics. As an example, we propose to use the slimmable layers in the intra-chunk blocks of a dual-path structure-based network to facilitate adaptation based on the local characteristics of the input signal. Experimental evaluation on simulated two-speaker mixtures from the WSJ0-2mix dataset demonstrates that the proposed method substantially reduces the computational cost while maintaining comparable performance to fully utilized static networks.
engineering, electrical & electronic
What problem does this paper attempt to address?