Abstract:Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function (HRTF) datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.

Binaural Noise Reduction in the Time Domain with a Stereo Setup

Effective binaural multi-channel processing algorithm for improved environmental presence

Speech Dereverberation and Noise Reduction for both diffusive noise field and point noise source in Binaural Hearing Aids: Preliminary Version

Robust Binaural Rendering with the Time-Domain Underdetermined Multichannel Inverse Prefilters.

Binaural Scene Analysis with Multidimensional Statistical Filters

Perceptually-Motivated Nonlinear Channel Decorrelation For Stereo Acoustic Echo Cancellation

Robust parameter design for Wiener-based binaural noise reduction methods in hearing aids

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation

End-to-End Paired Ambisonic-Binaural Audio Rendering

Improvement of Crosstalk Cancellation for Stereo Reproduction System Based on Two Loudspeakers

High-Fidelity Noise Reduction with Differentiable Signal Processing

Binaural sound source localization based on generalized parametric model and two-layer matching strategy in complex environments

Optimal Phase-Space Projection For Noise Reduction

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

A Multiband Noise Reduction Wiener Filter Algorithm for Hearing Aid

Real-time binaural speech separation with preserved spatial cues

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Increasing speech intelligibility in monaural hearing by adding noise at the other ear

Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect