Scalable-Complexity Steered Response Power Mapping based on Low-Rank and Sparse Interpolation

Thomas Dietzen,Enzo De Sena,Toon van Waterschoot
DOI: https://doi.org/10.1109/TASLP.2024.3496317
2024-11-22
Abstract:The steered response power (SRP) is a popular approach to compute a map of the acoustic scene, typically used for acoustic source localization. The SRP map is obtained as the frequency-weighted output power of a beamformer steered towards a grid of candidate locations. Due to the exhaustive search over a fine grid at all frequency bins, conventional frequency domain-based SRP (conv. FD-SRP) results in a high computational complexity. Time domain-based SRP (conv. TD-SRP) implementations reduce computational complexity at the cost of accuracy using the inverse fast Fourier transform (iFFT). In this paper, to enable a more favourable complexity-performance trade-off as compared to conv. FD-SRP and conv. TD-SRP, we consider the problem of constructing a fine SRP map over the entire search space at scalable computational cost. We propose two approaches to this problem. Expressing the conv. FD-SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While sampling can be implemented by the iFFT, we propose to use optimal low-rank or sparse approximations of the interpolation matrix for complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in various localization scenarios with speech as source signals and compared to the state-of-the-art. The results indicate that SSPI-SRP performs better if large array apertures are used, while SLRI-SRP performs better at small array apertures or a large number of microphones. In comparison to conv. FD-SRP, two to three orders of magnitude of complexity reduction can achieved, often times enabling a more favourable complexity-performance trade-off as compared to conv. TD-SRP. A MATLAB implementation is available online.
Audio and Speech Processing,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce the computational complexity of the Steered Response Power (SRP) method in sound source localization while maintaining high precision. Although the traditional frequency - domain SRP (conv. FD - SRP) method is accurate, it has a very high computational complexity because it requires an exhaustive search for candidate positions on a fine - grained grid in each frequency interval. The traditional time - domain SRP (conv. TD - SRP) method reduces the computational complexity by using the inverse fast Fourier transform (iFFT), but sacrifices accuracy, especially in the case of a small array aperture. For this reason, the paper proposes two new methods to construct a fine SRP map over the entire search space while achieving computational complexity scalability. These two methods are SRP based on Sampling and Low - Rank Interpolation (SLRI - SRP) and SRP based on Sampling and Sparse Interpolation (SSPI - SRP). These methods represent the traditional FD - SRP map as a matrix transformation of the frequency - domain Generalized Cross - Correlation (GCC), and decompose the SRP matrix into a sampling matrix and an interpolation matrix. The sampling part can be efficiently implemented by iFFT, and the interpolation matrix further reduces the computational complexity through optimal low - rank or sparse approximation. Specifically, the main contributions of the paper include: 1. **Low - complexity interpolation scheme**: Compared with existing SRP methods using interpolation, the paper derives a scalable, low - complexity interpolation scheme based on a global optimal criterion. By representing the SRP map as a matrix transformation of the frequency - domain GCC, the SRP matrix is decomposed into a sampling matrix and an interpolation matrix. The sampling part can be efficiently implemented by iFFT, and the interpolation matrix further reduces the computational complexity through low - rank or sparse approximation. These approximations are obtained by the solutions of quadratic optimization problems with low - rank and sparse constraints, and are therefore optimal in the sense of squared error. 2. **Trade - off analysis between complexity and performance**: The paper studies in detail the trade - off between complexity and performance. The proposed SLRI - SRP and SSPI - SRP methods are evaluated in various localization scenarios and compared with the low - rank - based SRP method (LR - SRP) and non - exhaustive spatial search methods. The results show that in the case of using a small number of microphones and a large array aperture, SSPI - SRP outperforms SLRI - SRP and LR - SRP in a wide range of complexity; while in the case of a small array aperture or a large number of microphones, SLRI - SRP performs better. Compared with the traditional FD - SRP, these two methods can achieve a two - to - three - order - of - magnitude reduction in complexity, and usually provide a more favorable trade - off between complexity and performance, which is better than the traditional TD - SRP. 3. **MATLAB implementation**: The paper provides an online MATLAB implementation so that researchers and engineers can easily test and apply these new methods. In summary, the methods proposed in this paper aim to significantly reduce the computational complexity of the SRP method while maintaining high precision through low - rank and sparse interpolation techniques, thereby improving the efficiency and performance of sound source localization in practical applications.