Deep Learning Based Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays.

Shupei Liu,Linfeng Feng,Yijun Gong,Chengdong Liang,Chen Zhang,Xiao-Lei Zhang,Xuelong Li
DOI: https://doi.org/10.48550/arxiv.2210.10265
2022-01-01
Abstract:While deep-learning-based speaker localization has shown advantages inchallenging acoustic environments, it often yields only direction-of-arrival(DOA) cues rather than precise two-dimensional (2D) coordinates. To addressthis, we propose a novel deep-learning-based 2D speaker localization methodleveraging ad-hoc microphone arrays, where an ad-hoc microphone array iscomposed of randomly distributed microphone nodes, each of which is equippedwith a traditional array. Specifically, we first employ convolutional neuralnetworks at each node to estimate speaker directions. Then, we integrate theseDOA estimates using triangulation and clustering techniques to get 2D speakerlocations. To further boost the estimation accuracy, we introduce a nodeselection algorithm that strategically filters the most reliable nodes.Extensive experiments on both simulated and real-world data demonstrate thatour approach significantly outperforms conventional methods. The proposed nodeselection further refines performance. The real-world dataset in theexperiment, named Libri-adhoc-node10 which is a newly recorded data describedfor the first time in this paper, is online available athttps://github.com/Liu-sp/Libri-adhoc-nodes10.
What problem does this paper attempt to address?