Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

Renana Opochinsky,Mordehay Moradi,Sharon Gannot
2024-01-07
Abstract:Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal. The increasing complexity of real-world environments, where multiple speakers might converse simultaneously, underscores the importance of effective speech separation techniques. This work presents a single-microphone speaker separation network with TF attention aiming at noisy and reverberant environments. We dub this new architecture as Separation TF Attention Network (Sep-TFAnet). In addition, we present a variant of the separation network, dubbed $ \text{Sep-TFAnet}^{\text{VAD}}$, which incorporates a voice activity detector (VAD) into the separation network.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?