Raw Waveform Based End-To-End Deep Convolutional Network For Spatial Localization Of Multiple Acoustic Sources

Harshavardhan Sundar,Weiran Wang,Ming Sun,Chao Wang
DOI: https://doi.org/10.1109/ICASSP40776.2020.9054090
2020-01-01
Abstract:In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions, and outperforms a recent time difference of arrival (TDOA) based multiple source localization approach reported in the literature.
What problem does this paper attempt to address?