Abstract:Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **estimating the distance of sound sources under diverse and dynamic acoustic conditions**. Specifically, the article points out: 1. **The distance estimation part in the sound source localization (SSL) task is under - researched**: - Compared with the direction - of - arrival (DOA) estimation, there is less research on sound source distance estimation. - Existing methods assume the use of non - co - located microphones for recording and are easily affected by room reverberation. 2. **Limitations of existing methods**: - Existing distance - estimation methods usually assume the room reverberation time (T60), and many methods are only applicable to specific room configurations or small - scale datasets. - Data - driven methods also have limitations, such as treating the task as a classification problem rather than directly estimating the distance. 3. **The proposed new method**: - The authors propose a convolutional recurrent neural network (CRNN) that can estimate the distance of moving sound sources in multiple datasets covering different room environments. - The model optimizes performance by introducing a new loss function, especially by weighting the error inversely proportional to the true distance of the sound source. 4. **Main contributions**: - Added distance annotations to existing open - source datasets. - Developed a deep - learning model capable of estimating the distance of sound sources across a variety of acoustic conditions. - Analyzed the influence of different loss functions on the model performance and found that the optimal training loss is to weight the model error as the reciprocal of the true distance of the sound source. In summary, this paper aims to more accurately estimate the distance of sound sources in diverse acoustic environments through deep - learning methods, thereby弥补现有研究中的不足.

Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging

Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform

Multilevel B-Splines-Based Learning Approach for Sound Source Localization

SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

Speaker Distance Estimation in Enclosures from Single-Channel Audio

Delay-and-Sum Beamforming Based Spatial Mapping for Multi-Source Sound Localization

Sound source localization method based time-domain signal feature using deep learning

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks

A Deep Learning Method for DOA Estimation with Covariance Matrices in Reverberant Environments

Sound source localization for auditory perception of a humanoid robot using deep neural networks

Spherical Convolutional Recurrent Neural Network for Real-Time Sound Source Tracking

Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information

Binaural sound source localization using a hybrid time and frequency domain model

Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators

Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture

Dynamic-Structured Reservoir Spiking Neural Network in Sound Localization

Deep Residual Network for Sound Source Localization in the Time Domain

A survey of sound source localization with deep learning methods