Abstract:Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct the spatial acoustic characteristics of any environment given sparse room impulse response (RIR) measurement data and planar reconstructions of the scene. Specifically, the authors aim to simulate any source audio at any location by using a small number (about 12) of RIR recordings and planar reconstructions of the scene, thereby achieving "Hearing Anything Anywhere". This goal is similar to the sparse - view novel view synthesis (NVS) task in computer vision and graphics, but sound waves are characterized by temporal variability and slow propagation speed compared to light waves, which makes common visual NVS methods unsuitable for audio processing. To achieve this goal, the paper introduces DIFFRIR, a differentiable RIR rendering framework that can explanatorily model significant acoustic feature parameters in the scene, such as the directivity of the sound source and the reflectivity of the surface. Through these models, DIFFRIR can synthesize new auditory experiences at any location in space. The paper also proposes an analysis - based synthesis paradigm to characterize the physically interpretable parameters of the sound source and the surfaces in the scene by optimizing the difference between the output of DIFFRIR and the known RIR measurement values. The main contributions of the paper include: 1. Proposing DIFFRIR, a differentiable acoustic inverse - rendering framework that can recover the immersive sound field of a room from a set of sparsely distributed RIR measurement data. 2. Constructing a new dataset containing real RIRs measured from hundreds of locations in four different real - world environments. 3. By comparing with existing methods in various settings, it is proven that in practical data - limited scenarios, the DIFFRIR method is more effective than existing methods and can predict more accurate RIRs and music at unseen locations. In summary, this research aims to capture the real - world acoustic space with a small amount of hardware setup (such as 12 microphones), which is more practical for many consumer application scenarios.

Hearing Anything Anywhere

3D Audio Rendering in Distributed Virtual Environment

Few-Shot Audio-Visual Learning of Environment Acoustics

Interpolating the Directional Room Impulse Response for Dynamic Spatial Audio Reproduction

Acoustic Volume Rendering for Neural Impulse Response Fields

Deep Room Impulse Response Completion

An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment

AV-RIR: Audio-Visual Room Impulse Response Estimation

Simplified model for generating 3D realistic sound in the multimedia and virtual reality systems

Room-aware portable Auditory Augmented Reality: Real-time spatial audio generation with geometric data analysis

Efficient learning-based sound propagation for virtual and real-world audio processing applications

Sound Field Translation and Mixed Source Model for Virtual Applications with Perceptual Validation

A binaural room impulse response dataset and Shorelining psychophysical task for the evaluation of auditory sensory augmentation

Rendering Spatial Sound for Interoperable Experiences in the Audio Metaverse

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

Scene-Aware Audio Rendering via Deep Acoustic Analysis

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information

The Role of 3-D Sound in Human Reaction and Performance in Augmented Reality Environments

Novel View Acoustic Parameter Estimation

Auralization based on multi-perspective ambisonic room impulse responses