Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information

Orel Ben Zaken,Anurag Kumar,Vladimir Tourbabin,Boaz Rafaely
DOI: https://doi.org/10.1109/taslp.2024.3357037
2024-01-01
Abstract:Direction-of-arrival (DOA) estimation is a fundamental task in audio signal processing that becomes difficult in real-world environments due to the presence of reverberation. To address this difficulty, Direct-Path Dominance (DPD) tests have been proposed as an effective approach for detecting time-frequency (TF) bins dominated by direct sound, which contain accurate DOA information. These have been found to be particularly efficient when working with spherical arrays. While methods based on neural networks (NNs) have been developed to estimate the DOA, they have limitations such as the need for a large training database, and often understanding of the system's operation is lacking. This work proposes two novel DPD-test methods based on a model-based deep learning approach that combines the original DPD-test model with a data-driven system. Thus, it is possible to preserve the robustness of the original DPD-test across acoustic environments, while using a data-driven approach to better extract useful information about the direct sound, thereby enhancing the original method's performance. In particular, the paper investigates how energetic, temporal and spatial information contribute to the identification of TF-bins dominated by the direct signal. The proposed methods are trained on simulated data of a single sound source in a room, and evaluated on simulated and real data. The results show that energetic and temporal information provide new information about direct sound, which has not been considered in previous works and can improve its performance.
engineering, electrical & electronic,acoustics
What problem does this paper attempt to address?