A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

Jingyuan Wang,Jie Zhang,Shihao Chen,Miao Sun

2024-09-19

Abstract:Binaural speech enhancement (BSE) aims to jointly improve the speech quality and intelligibility of noisy signals received by hearing devices and preserve the spatial cues of the target for natural listening. Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues preservation (SCP) accuracy and a high computational demand in complex acoustic scenes. In this work, we present a learning-based lightweight binaural complex convolutional network (LBCCN), which excels in NR by filtering low-frequency bands and keeping the rest. Additionally, our approach explicitly incorporates the estimation of interchannel relative acoustic transfer function to ensure the spatial cues fidelity and speech clarity. Results show that the proposed LBCCN can achieve a comparable NR performance to state-of-the-art methods under various noise conditions, but with a much lower computational cost and a better SCP. The reproducible code and audio examples are available at <a class="link-external link-https" href="https://github.com/jywanng/LBCCN" rel="external noopener nofollow">this https URL</a>.

Sound,Artificial Intelligence,Audio and Speech Processing

What problem does this paper attempt to address?

The paper attempts to address two key challenges in Binaural Speech Enhancement (BSE): Noise Reduction (NR) and Spatial Cues Preservation (SCP). Specifically, existing methods often face the following issues when dealing with complex acoustic scenes: 1. **Trade-off between noise reduction and spatial cues preservation**: Existing methods usually struggle to effectively reduce noise while preserving the spatial cues of the target speech, making it difficult for users of auditory devices (such as hearing aids, headphones, etc.) to achieve a natural auditory experience in noisy environments. 2. **High computational demand**: Many existing methods require high computational resources when processing complex acoustic scenes, making them difficult to deploy on low-resource and real-time demanding devices. To address these issues, the paper proposes a Lightweight Binaural Complex Convolutional Network (LBCCN), which can achieve comparable noise reduction performance to existing state-of-the-art methods under various noise conditions, while having lower computational cost and better spatial cues preservation capability.

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement

Low bit rate binaural link for improved ultra low-latency low-complexity multichannel speech enhancement in Hearing Aids

Binaural Deep Neural Network for Robust Speech Enhancement

A Supervised Speech Enhancement Method for Smartphone-Based Binaural Hearing Aids

Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement.

Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

An RNN-based Speech Enhancement Method for a Binaural Hearing Aid System

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Speech Enhancement Based on Binaural Sound Source Localization and Cosh Measure Wiener Filtering

Effective binaural multi-channel processing algorithm for improved environmental presence

SE Territory: Monaural Speech Enhancement Meets the Fixed Virtual Perceptual Space Mapping

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers

Ultra-Low Latency Speech Enhancement - A Comprehensive Study

LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement

Real-time binaural speech separation with preserved spatial cues

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

Improving Monaural Speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions