Speech Enhancement Using Open-Unmix Music Source Separation Architecture

Kartik Kumar Thakur,Srijita Choudhury,Saswata Ghosh,Saransh Dash,Taranpreet Singh Chabbra,Israj Ali,Rashmi T Shankarappa,Sourabh Tiwari,Saksham Goyal
DOI: https://doi.org/10.1109/delcon54057.2022.9753157
2022-02-11
Abstract:Speech enhancement using sound source separation finds usefulness in prevention of degradation of the quality of human speech on voice call/video call, voice assistant commands due to background noises. Open-Unmix is popular architecture used by researchers for music source separation. This paper proposes a modified implementation of the Open-Unmix model, to attain the goal of sound source separation for speech enhancement. This paper explains in detail, the custom dataset collection and pre-processing methods to generate the training data. The improved Open-Unmix model is a deep neural network that estimates separation masks in the short-time Fourier transform domain. It is based on three-layer bidirectional long short-term memory (Bi-LSTM) with completely connected encoding and decoding layers. The source to artifact ratio (SAR), scale invariant source to distortion ratio (SI-SDR), short time objective intelligibility (STOI), and source to distortion ratio (SDR) are all popular subjective measures for blind source separation of audio signals used in our overall assessment. Experimental results show that our proposed method can separate noise background signals from human speech and provide enhancement in real environment.
What problem does this paper attempt to address?