Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

Saksham Singh Kushwaha,Iran R. Roman,Magdalena Fuentes,Juan Pablo Bello
2023-09-17
Abstract:Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **estimating the distance of sound sources under diverse and dynamic acoustic conditions**. Specifically, the article points out: 1. **The distance estimation part in the sound source localization (SSL) task is under - researched**: - Compared with the direction - of - arrival (DOA) estimation, there is less research on sound source distance estimation. - Existing methods assume the use of non - co - located microphones for recording and are easily affected by room reverberation. 2. **Limitations of existing methods**: - Existing distance - estimation methods usually assume the room reverberation time (T60), and many methods are only applicable to specific room configurations or small - scale datasets. - Data - driven methods also have limitations, such as treating the task as a classification problem rather than directly estimating the distance. 3. **The proposed new method**: - The authors propose a convolutional recurrent neural network (CRNN) that can estimate the distance of moving sound sources in multiple datasets covering different room environments. - The model optimizes performance by introducing a new loss function, especially by weighting the error inversely proportional to the true distance of the sound source. 4. **Main contributions**: - Added distance annotations to existing open - source datasets. - Developed a deep - learning model capable of estimating the distance of sound sources across a variety of acoustic conditions. - Analyzed the influence of different loss functions on the model performance and found that the optimal training loss is to weight the model error as the reciprocal of the true distance of the sound source. In summary, this paper aims to more accurately estimate the distance of sound sources in diverse acoustic environments through deep - learning methods, thereby弥补现有研究中的不足.