Separating Voices from Multiple Sound Sources Using 2D Microphone Array

Xinran Lu,Lei Xie,Fang Wang,Tao Gu,Chuyu Wang,Wei Wang,Sanglu Lu
DOI: https://doi.org/10.1109/infocom48880.2022.9796768
2022-01-01
Abstract:Voice assistant has been widely used for human-computer interaction and automatic meeting minutes. However, for multiple sound sources, the performance of speech recognition in voice assistant decreases dramatically. Therefore, it is crucial to separate multiple voices efficiently for an effective voice assistant application in multi-user scenarios. In this paper, we present a novel voice separation system using a 2D microphone array in multiple sound source scenarios. Specifically, we propose a spatial filtering-based method to iteratively estimate the Angle of Arrival (AoA) of each sound source and separate the voice signals with adaptive beamforming. We use BeamForming-based cross-Correlation (BF-Correlation) to accurately assess the performance of beamforming and automatically optimize the voice separation in the iterative framework. Different from cross-correlation, BF-Correlation further performs cross-correlation among the after-beamforming voice signals processed with each linear microphone array. In this way, the mutual interference from voice signals out of the specified direction can be effectively suppressed or mitigated via the spatial filtering technique. We implement a prototype system and evaluate its performance in real environments. Experimental results show that the average AoA error is 1.4 degree and the average ratio of automatic speech recognition accuracy is 90.2% in the presence of three sound sources.
What problem does this paper attempt to address?