Abstract:Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforward. The design of robust, high-performance, adaptive beamformers for such scenarios is an on-going challenge. This is due to the violation of the typically required assumptions on the noise field caused by, for example, rapid variations resulting from complex acoustic environments, and/or rotations of the listener's head. This work proposes a multi-channel speech enhancement algorithm which utilises the adaptability of signal-dependent beamformers while still benefiting from the computational efficiency and robust performance of signal-independent super-directive beamformers. The algorithm has two stages. (i) The first stage is a hybrid beamformer based on a dictionary of weights corresponding to a set of noise field models. (ii) The second stage is a wide-band subspace post-filter to remove any artifacts resulting from (i). The algorithm is evaluated using both real-world recordings and simulations of a cocktail-party scenario. Noise suppression, intelligibility and speech quality results show a significant performance improvement by the proposed algorithm compared to the baseline super-directive beamformer. A data-driven implementation of the noise field dictionary is shown to provide more noise suppression, and similar speech intelligibility and quality, compared to a parametric dictionary.

What problem does this paper attempt to address?

This paper attempts to solve the problem of speech enhancement in complex acoustic scenarios when using head - mounted microphone arrays in augmented - reality audio. Specifically, the paper focuses on how to improve the quality and intelligibility of speech signals in the context of local interference sources and environmental noise in the presence of one or more target speakers. The challenge lies in the fact that the performance of existing signal - dependent beamformers is significantly affected when facing rapidly changing sound fields (such as those caused by head rotation) and non - isotropic noise fields. Therefore, designing a multichannel speech - enhancement algorithm that can adapt to these complex scenarios while remaining computationally efficient, efficient, and robust has become the focus of research. To solve the above problems, the paper proposes a new multichannel speech - enhancement algorithm, which is divided into two stages: 1. **Hybrid Beamforming**: Based on a dictionary containing weights of multiple noise - field models, the adaptability of the signal - dependent beamformer is achieved. 2. **Broadband Subspace Post - filtering**: Any artifacts that may be generated in the first stage are removed to further improve speech quality and intelligibility. Through these two stages, the algorithm aims to provide better noise - suppression effects than traditional super - directive beamformers without sacrificing computational efficiency. The paper evaluates the proposed algorithm using actual recordings and simulated "cocktail party" scenario data. The results show that the algorithm is superior to the baseline super - directive beamformer in terms of noise suppression, speech intelligibility, and quality. In addition, the data - driven noise - field - model dictionary performs better in noise suppression while maintaining similar speech intelligibility and quality. In summary, the main contribution of this paper is the proposal of a multichannel speech - enhancement algorithm that can work effectively in complex acoustic environments, providing a new solution for speech processing in augmented - reality audio applications.

Subspace Hybrid MVDR Beamforming for Augmented Hearing

Subspace Hybrid Beamforming for Head-worn Microphone Arrays

A High-Resolution and Low-Frequency Acoustic Beamforming Based on Bayesian Inference and Non-Synchronous Measurements

Deep Learning Based Speech Beamforming

Benefits of triple acoustic beamforming during speech-on-speech masking and sound localization for bilateral cochlear-implant users

Virtual Augmentation of the Beamforming Array Based on a Sub-cross-spectral Matrix Computation for Localizing Stationary Signal Noise Sources

Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding

Adaptive Dereverberation, Noise and Interferer Reduction Using Sparse Weighted Linearly Constrained Minimum Power Beamforming

A Speech Enhancement System Based on Real-time Sound Source Localization and Super-directional Fixed Beamforming

An Efficient Near-Field Parallel Subarray Beamforming for Portable 3D Imaging Sonar

GSC-like Speech Enhancement for Dual Small Microphone Array

Masking-based Neural Beamformer for Multichannel Speech Enhancement

Adaptive beamforming for array signal processing in aeroacoustic measurements.

A Speech Enhancement Method Combining Beamforming with RNN for Hearing Aids.

A Corpus-Based Evaluation of Beamforming Techniques and Phase-Based Frequency Masking

Attention-Based Beamformer For Multi-Channel Speech Enhancement

Advanced Beamformers for Cochlear Implant Users: Acute Measurement of Speech Perception in Challenging Listening Conditions

Design of a robust MVDR beamforming method with Low-Latency by reconstructing covariance matrix for speech enhancement

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers

Secure Hybrid Analog and Digital Beamforming for mmWave XR Communications with Mixed-DAC

Cooperative Audio Source Separation and Enhancement Using Distributed Microphone Arrays and Wearable Devices