Abstract:To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech and the noise components. Furthermore, for single-microphone scenarios, multi-frame algorithms such as the multi-frame MVDR (MFMVDR) filter have been proposed, which exploit temporal instead of spatial correlations. In this contribution, we propose a binaural extension of the MFMVDR filter, which exploits both spatial and temporal correlations. The binaural MFMVDR filters are embedded in an end-to-end deep learning framework, where the required parameters, i.e., the speech spatio-temporal correlation vectors as well as the (inverse) noise spatio-temporal covariance matrix, are estimated by temporal convolutional networks (TCNs) that are trained by minimizing the mean spectral absolute error loss function. Simulation results comprising measured binaural room impulses and diverse noise sources at signal-to-noise ratios from -5 dB to 20 dB demonstrate the advantage of utilizing the binaural MFMVDR filter structure over directly estimating the binaural multi-frame filter coefficients with TCNs.

What problem does this paper attempt to address?

This paper attempts to solve the problem of improving the speech intelligibility and speech quality of binaural hearing - aid devices in noisy environments. Specifically, the paper proposes an extended multi - frame minimum variance distortionless response (MVDR) filter for binaural noise suppression. ### Problem Background In many speech communication scenarios, head - worn assistive listening devices (such as binaural hearing aids) will capture not only the voice of the target speaker but also environmental noise, which will lead to a decline in speech quality and speech intelligibility. Therefore, researchers have proposed a variety of binaural noise suppression algorithms. These algorithms usually assume that adjacent short - time Fourier transform (STFT) coefficients are uncorrelated in time. Under this assumption, the speech STFT coefficients of the left and right reference microphones can be estimated by applying a single - frame binaural filter. However, this assumption may not be accurate enough in some cases, especially when it is necessary to utilize time correlation. For this reason, researchers have proposed multi - frame methods, which can utilize the time correlation of adjacent STFT coefficients in single - microphone or multi - microphone noise suppression. ### Solution Proposed in the Paper This paper proposes an extended multi - frame MVDR filter (MFMVDR) to simultaneously utilize spatial and temporal correlations. Specifically, the author extends the MFMVDR filter in the single - microphone scenario to the binaural scenario and embeds it into an end - to - end deep - learning framework. In this framework, the required parameters (i.e., the speech spatio - temporal correlation vector and the inverse of the noise spatio - temporal covariance matrix) are estimated by temporal convolutional networks (TCNs), and these networks are trained by minimizing the mean spectral absolute error (MSAE) loss function. ### Formula Representation 1. **Signal Model**: In the STFT domain, the noisy signal \(y_{m,f,t}\) of the \(m\) - th microphone, the \(f\) - th frequency bin and the \(t\) - th time frame can be expressed as: \[ y_{m,f,t}=x_{m,f,t}+n_{m,f,t} \] where \(x_{m,f,t}\) and \(n_{m,f,t}\) represent the speech and noise components respectively. 2. **Multi - frame Vector**: For the single - microphone multi - frame noise suppression algorithm, the noisy multi - frame vector \(\bar{y}_{m,t}\in\mathbb{C}^N\) is defined as: \[ \bar{y}_{m,t} = [y_{m,t},\dots,y_{m,t - N + 1}]^T \] 3. **Multi - microphone Multi - frame Noise Suppression Algorithm**: The noisy multi - microphone multi - frame vector \(y_t\in\mathbb{C}^{NM}\) is defined as: \[ y_t = [\bar{y}_{1,t}^T,\dots,\bar{y}_{M,t}^T]^T \] 4. **Optimization Problem**: In order to minimize the output noise power spectral density and keep the relevant speech components undistorted, the optimization problem can be expressed as: \[ \arg\min_{w_{m,t}}w_{m,t}^H\Phi_{n,t}w_{m,t}\quad\text{s.t.}\quad w_{m,t}^H\gamma_{x,m,t}=1 \] After solving this optimization problem, the obtained binaural MFMVDR filter is: \[ w_{\text{MFMVDR}}^{m,t}=\frac{\Phi_{n,t}^{-1}\gamma_{x,m,t}}{\gamma_{x,m,t}^H\Phi_{n,t}^{-1}\gamma_{x,m,t}} \] ### Summary The main contribution of this paper is the proposal of a new binaural MFMVDR filter, which can simultaneously utilize the spatial and temporal correlations of speech and noise. By embedding this filter into an end - to - end deep - learning framework and using TCNs to estimate the required parameters, this method performs well in various noise environments and significantly improves the effect of binaural noise suppression.

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

Deep Multi-Frame Filtering for Hearing Aids

RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Effective binaural multi-channel processing algorithm for improved environmental presence

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

Attention-Based Beamformer For Multi-Channel Speech Enhancement

Unsupervised Improved MVDR Beamforming for Sound Enhancement

Neural Spatio-Temporal Beamformer for Target Speech Separation

Acoustic Echo Suppression Using A Learning-Based Multi-Frame Minimum Variance Distortionless Response (mfmvdr) Filter

Low bit rate binaural link for improved ultra low-latency low-complexity multichannel speech enhancement in Hearing Aids

A MVDR- MWF Combined Algorithm for Binaural Hearing Aid System

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing

Subspace Hybrid MVDR Beamforming for Augmented Hearing

Design of a robust MVDR beamforming method with Low-Latency by reconstructing covariance matrix for speech enhancement

Modified Complementary Joint Sparse Representations: A Novel Post-Filtering to MVDR Beamforming.

Subspace Hybrid Beamforming for Head-worn Microphone Arrays

Speech Dereverberation and Noise Reduction for both diffusive noise field and point noise source in Binaural Hearing Aids: Preliminary Version