Abstract:Multi-source sound localization can find applications in many domains including auditory scene analysis, fault detection and diagnosis in manufacturing, augmented reality, etc. In far fields, 3D sound source localization is equivalent to finding the direction of arrival (DOA), namely, the azimuth and elevation angles of sound sources. Recent DOA estimation pipelines take multichannel audio inputs, extract spectral features from each channel and then feed them into a deep neural network. Unfortunately, the spectral features contain only the time-frequency information of the audio signals, while spatial information is only implicitly captured in the signals across different channels, which is highly dependent on the acoustic array geometry. To embed the spatial information of the sound source into the spectral feature representation, we propose a DSB-based spatial mapping method encode sound source location information. It can be combined with different feature extraction methods and machine learning models for DOA estimation. Furthermore, a redundancy removal procedure is proposed to accelerate DSB computation so that the pipeline can run in real-time on embedded GPUs, such as NVidia Jeston Nano. We conduct extensive experiments using two neural network models along with the DSB method on two datasets. The experiments demonstrate that the DOA errors can be effectively reduced using the DSB method. When combining DSB for feature extraction, the DOA errors are reduced by up to 19.24%. In addition, the feature extraction process is accelerated by up to 30.42% after the application of redundancy removal.

Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization.

Passive Localization Based on Double-Correlation Function Using Sources of Opportunity

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

A Two Microphone-Based Approach For Source Localization Of Multiple Speech Sources

A Cascaded Multiple-Speaker Localization and Tracking System

Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization

Joint DOA Estimation and Dereverberation Based on Multi-Channel Linear Prediction Filtering and Azimuth Sparsity

Source Localization by Multidimensional Steered Response Power Mapping with Sparse Bayesian Learning

SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization

An iteratively reweighted steered response power approach to multisource localization using a distributed microphone network

Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector

Towards Robust Multiple Blind Source Localization Using Source Separation and Beamforming

Probabilistic Binaural Multiple Sources Localization Based On Time-Delay Compensation Estimator And Clustering Analysis

Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform

Two-dimensional detection based LRSS point recognition for multi-source DOA estimation

3D Single Source Localization Based on Euclidean Distance Matrices

Delay-and-Sum Beamforming Based Spatial Mapping for Multi-Source Sound Localization

Multiple Sound Source Localization Using Gammatone Auditory Filtering and Direct Sound Componence Detection

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Passive Localization Based on Spectral Estimation Methods Using Sources of Opportunity

Two microphone based direction of arrival estimation for multiple speech sources using spectral properties of speech