Blind Identification of Binaural Room Impulse Responses from Smart Glasses

Thomas Deppisch,Nils Meyer-Kahlen,Sebastià V. Amengual Garí

DOI: https://doi.org/10.1109/TASLP.2024.3454964

2024-09-23

Abstract:Smart glasses are increasingly recognized as a key medium for augmented reality, offering a hands-free platform with integrated microphones and non-ear-occluding loudspeakers to seamlessly mix virtual sound sources into the real-world acoustic scene. To convincingly integrate virtual sound sources, the room acoustic rendering of the virtual sources must match the real-world acoustics. Information about a user's acoustic environment however is typically not available. This work uses a microphone array in a pair of smart glasses to blindly identify binaural room impulse responses (BRIRs) from a few seconds of speech in the real-world environment. The proposed method uses dereverberation and beamforming to generate a pseudo reference signal that is used by a multichannel Wiener filter to estimate room impulse responses which are then converted to BRIRs. The multichannel room impulse responses can be used to estimate room acoustic parameters which is shown to outperform baseline algorithms in the estimation of reverberation time and direct-to-reverberant energy ratio. Results from a listening experiment further indicate that the estimated BRIRs often reproduce the real-world room acoustics perceptually more convincingly than measured BRIRs from other rooms of similar size.

Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to blindly identify binaural room impulse responses (BRIRs) using the microphone array integrated on smart glasses in augmented reality (AR) applications, so that the room acoustic rendering of virtual sound sources matches the acoustic environment of the real world. Specifically, the paper focuses on how to estimate BRIRs from a few seconds of real - world speech without pre - obtaining information about the user's acoustic environment, in order to achieve seamless integration of virtual sounds and the actual environment. This process involves techniques such as dereverberation, beamforming, and multichannel Wiener filtering, and the ultimate goal is to enhance the sense of realism and immersion of virtual sound sources in AR applications.

Blind Identification of Binaural Room Impulse Responses from Smart Glasses

Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information

A binaural room impulse response dataset and Shorelining psychophysical task for the evaluation of auditory sensory augmentation

A Wearable Vision-To-Audio Sensory Substitution Device for Blind Assistance and the Correlated Neural Substrates

An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment

Blind identification of Ambisonic reduced room impulse response

Blind Localization of Room Reflections with Application to Spatial Audio

Blind Room Parameter Estimation Using Multiple-Multichannel Speech Recordings

Room-aware portable Auditory Augmented Reality: Real-time spatial audio generation with geometric data analysis

The impact of binaural auralizations on sound source localization and social presence in audiovisual virtual reality: converging evidence from placement and eye-tracking paradigmsâ ̃

AV-RIR: Audio-Visual Room Impulse Response Estimation

BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

Hearing Anything Anywhere

Blind Acoustic Room Parameter Estimation Using Phase Features

The Study On Simulating Binaural Room Impulse Response

Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network

Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators

Insights into the Incorporation of Signal Information in Binaural Signal Matching with Wearable Microphone Arrays

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks