Abstract:Speech activity detection aims to distinguish the speech/non-speech sections in audio data. This technology had been widely used in the scene of speech recognition, speech enhancement and speaker diarization, where most of them adopted methods of multiple threshold, reducing noise, Gaussian Mixture Model (GMM) or Deep Neural Network (DNN) as the state-of-the-art. As the front-end of these applications, the precision of speech activity detection and speaker localization will serious impact the overall system performance. But how to conquer the interference caused by indoor reverberation and environmental noise is still the bottleneck of improving the accuracy of detection by single channel. Distributed microphones are integrated with scattered microphones in the same room or space and each microphone has its own device to collect data. It can utilize the time delay of sound source to depress the interference of non-speech signals and has no prior request on location or synchronism which is strictly regulated in microphone array. For its convenience, distributed microphones system is being increasingly applied in smart home, vehicle hands-free communication and monitoring. In this paper, a method of enhanced Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) based on distributed microphones is proposed and compared with the same method on single channel. In several distributed microphones datasets, the novel method has the best twenty-four percent and eighteen percent increase in terms of precision and recall of detection. At the same time, the correct rate of 3D-coordinate speaker localization has been proved to go up thirty present than before.

Speech Activity Detection and Speaker Localization Based on Distributed Microphones.

Distributed speech separation in spatially unconstrained microphone arrays

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

Real-time Architecture for Audio-Visual Active Speaker Detection.

DNN-based Voice Activity Detection for Speaker Recognition

Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments

Speech recognition method based on DNN-LSTM combined with Wiener filtering algorithm

Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform

Visually Supervised Speaker Detection and Localization via Microphone Array

DNN-based Sound Source Localization Method with Microphone Array

Speaker clustering method for distributed microphone

Distributed Marginalized Auxiliary Particle Filter for Speaker Tracking in Distributed Microphone Networks

Microphone Clustering and BP Network based Acoustic Source Localization in Distributed Microphone Arrays

Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localisation of Multiple Sources in Reverberant Environments

Multi-task Joint-Learning for Robust Voice Activity Detection

A Neural Network-based Howling Detection Method for Real-Time Communication Applications

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

Audio Enhancement and Intelligent Classification of Household Sound Events Using a Sparsely Deployed Array

An Investigation into Using Parallel Data for Far-Field Speech Recognition.

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.