Speech Activity Detection and Speaker Localization Based on Distributed Microphones.

Yi Yang,Jingyun Zhang,Jiasong Sun
DOI: https://doi.org/10.1007/978-3-319-40542-1_64
2016-01-01
Abstract:Speech activity detection aims to distinguish the speech/non-speech sections in audio data. This technology had been widely used in the scene of speech recognition, speech enhancement and speaker diarization, where most of them adopted methods of multiple threshold, reducing noise, Gaussian Mixture Model (GMM) or Deep Neural Network (DNN) as the state-of-the-art. As the front-end of these applications, the precision of speech activity detection and speaker localization will serious impact the overall system performance. But how to conquer the interference caused by indoor reverberation and environmental noise is still the bottleneck of improving the accuracy of detection by single channel. Distributed microphones are integrated with scattered microphones in the same room or space and each microphone has its own device to collect data. It can utilize the time delay of sound source to depress the interference of non-speech signals and has no prior request on location or synchronism which is strictly regulated in microphone array. For its convenience, distributed microphones system is being increasingly applied in smart home, vehicle hands-free communication and monitoring. In this paper, a method of enhanced Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) based on distributed microphones is proposed and compared with the same method on single channel. In several distributed microphones datasets, the novel method has the best twenty-four percent and eighteen percent increase in terms of precision and recall of detection. At the same time, the correct rate of 3D-coordinate speaker localization has been proved to go up thirty present than before.
What problem does this paper attempt to address?