Blind Identification of Binaural Room Impulse Responses from Smart Glasses

Thomas Deppisch,Nils Meyer-Kahlen,Sebastià V. Amengual Garí
DOI: https://doi.org/10.1109/TASLP.2024.3454964
2024-09-23
Abstract:Smart glasses are increasingly recognized as a key medium for augmented reality, offering a hands-free platform with integrated microphones and non-ear-occluding loudspeakers to seamlessly mix virtual sound sources into the real-world acoustic scene. To convincingly integrate virtual sound sources, the room acoustic rendering of the virtual sources must match the real-world acoustics. Information about a user's acoustic environment however is typically not available. This work uses a microphone array in a pair of smart glasses to blindly identify binaural room impulse responses (BRIRs) from a few seconds of speech in the real-world environment. The proposed method uses dereverberation and beamforming to generate a pseudo reference signal that is used by a multichannel Wiener filter to estimate room impulse responses which are then converted to BRIRs. The multichannel room impulse responses can be used to estimate room acoustic parameters which is shown to outperform baseline algorithms in the estimation of reverberation time and direct-to-reverberant energy ratio. Results from a listening experiment further indicate that the estimated BRIRs often reproduce the real-world room acoustics perceptually more convincingly than measured BRIRs from other rooms of similar size.
Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to blindly identify binaural room impulse responses (BRIRs) using the microphone array integrated on smart glasses in augmented reality (AR) applications, so that the room acoustic rendering of virtual sound sources matches the acoustic environment of the real world. Specifically, the paper focuses on how to estimate BRIRs from a few seconds of real - world speech without pre - obtaining information about the user's acoustic environment, in order to achieve seamless integration of virtual sounds and the actual environment. This process involves techniques such as dereverberation, beamforming, and multichannel Wiener filtering, and the ultimate goal is to enhance the sense of realism and immersion of virtual sound sources in AR applications.