Subspace Hybrid MVDR Beamforming for Augmented Hearing

Sina Hafezi,Alastair H. Moore,Pierre H. Guiraud,Patrick A. Naylor,Jacob Donley,Vladimir Tourbabin,Thomas Lunner
2023-12-01
Abstract:Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforward. The design of robust, high-performance, adaptive beamformers for such scenarios is an on-going challenge. This is due to the violation of the typically required assumptions on the noise field caused by, for example, rapid variations resulting from complex acoustic environments, and/or rotations of the listener's head. This work proposes a multi-channel speech enhancement algorithm which utilises the adaptability of signal-dependent beamformers while still benefiting from the computational efficiency and robust performance of signal-independent super-directive beamformers. The algorithm has two stages. (i) The first stage is a hybrid beamformer based on a dictionary of weights corresponding to a set of noise field models. (ii) The second stage is a wide-band subspace post-filter to remove any artifacts resulting from (i). The algorithm is evaluated using both real-world recordings and simulations of a cocktail-party scenario. Noise suppression, intelligibility and speech quality results show a significant performance improvement by the proposed algorithm compared to the baseline super-directive beamformer. A data-driven implementation of the noise field dictionary is shown to provide more noise suppression, and similar speech intelligibility and quality, compared to a parametric dictionary.
Audio and Speech Processing,Sound,Signal Processing
What problem does this paper attempt to address?
This paper attempts to solve the problem of speech enhancement in complex acoustic scenarios when using head - mounted microphone arrays in augmented - reality audio. Specifically, the paper focuses on how to improve the quality and intelligibility of speech signals in the context of local interference sources and environmental noise in the presence of one or more target speakers. The challenge lies in the fact that the performance of existing signal - dependent beamformers is significantly affected when facing rapidly changing sound fields (such as those caused by head rotation) and non - isotropic noise fields. Therefore, designing a multichannel speech - enhancement algorithm that can adapt to these complex scenarios while remaining computationally efficient, efficient, and robust has become the focus of research. To solve the above problems, the paper proposes a new multichannel speech - enhancement algorithm, which is divided into two stages: 1. **Hybrid Beamforming**: Based on a dictionary containing weights of multiple noise - field models, the adaptability of the signal - dependent beamformer is achieved. 2. **Broadband Subspace Post - filtering**: Any artifacts that may be generated in the first stage are removed to further improve speech quality and intelligibility. Through these two stages, the algorithm aims to provide better noise - suppression effects than traditional super - directive beamformers without sacrificing computational efficiency. The paper evaluates the proposed algorithm using actual recordings and simulated "cocktail party" scenario data. The results show that the algorithm is superior to the baseline super - directive beamformer in terms of noise suppression, speech intelligibility, and quality. In addition, the data - driven noise - field - model dictionary performs better in noise suppression while maintaining similar speech intelligibility and quality. In summary, the main contribution of this paper is the proposal of a multichannel speech - enhancement algorithm that can work effectively in complex acoustic environments, providing a new solution for speech processing in augmented - reality audio applications.