Abstract:We present SPEAR, a continuous receiver-to-receiver acoustic neural warping field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by warping the spatial acoustic effects from one reference receiver position to another target receiver position, so that the warped audio essentially accommodates all spatial acoustic effects belonging to the target position. SPEAR can be trained in a data much more readily accessible manner, in which we simply ask two robots to independently record spatial audio at different positions. We further theoretically prove the universal existence of the warping field if and only if one audio source presents. Three physical principles are incorporated to guide SPEAR network design, leading to the learned warping field physically meaningful. We demonstrate SPEAR superiority on both synthetic, photo-realistic and real-world dataset, showing the huge potential of SPEAR to various down-stream robotic tasks.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of predicting spatial acoustic effects from receiver - to - receiver in three - dimensional enclosed spaces. Specifically, the authors propose a new framework - SPEAR (Spatial Perceptual Acoustic Neural Warping Field) for predicting spatial acoustic effects such as reverberation, loudness variation, and resonance at any given receiver position. Traditional methods mainly rely on source - to - receiver modeling methods. These methods require a large amount of prior knowledge to accurately simulate the process of sound propagation from the source to the receiver, such as the geometric layout of the room, material properties, and the position of the sound source. However, obtaining this prior knowledge is very difficult in practical applications and has high computational complexity. In addition, although some neural - network - based methods can learn continuous acoustic fields, they still require a large amount of RIR data for training, which is also difficult to achieve in real - world scenarios. To solve these problems, SPEAR proposes a completely new perspective: **from the receiver - to - receiver perspective**. By using only the audio data independently recorded by two receivers at different positions, SPEAR can learn and predict spatial acoustic effects without relying on complex RIR data or prior acoustic properties. This method is not only easier to obtain data, but also can efficiently predict spatial acoustic effects at any position. ### Main contributions 1. **Proposed a novel receiver - to - receiver spatial acoustic effect prediction framework**: - SPEAR does not require traditional RIR data or difficult - to - obtain prior spatial acoustic properties, but is trained with more easily obtainable data (i.e., audio recorded by receivers at different positions). 2. **Theoretically proved the existence of the receiver - to - receiver neural warping field**: - The authors prove that when there is a single stationary sound source in 3D space, the receiver - to - receiver neural warping field is ubiquitous, and the designed network structure is based on three physical principles (globality, order perception, and audio - content - independence), making it physically meaningful. 3. **Demonstrated the superior performance of SPEAR on synthetic data, photo - realistic data, and real - world datasets**: - The experimental results show that SPEAR performs well on various datasets, especially has great potential in predicting spatial acoustic effects and is suitable for a variety of downstream robotic tasks. ### Mathematical formula representation - **Problem definition**: \[ W_{pr \to pt} = F_\theta(p_t, p_r); \quad \hat{X}_{pr \to pt}(f) = W_{pr \to pt} \cdot X_{pr}(f) \] where \(\theta\) is the trainable parameter, \(p_r\) and \(p_t\) are the reference position and the target position respectively, and \(X_{pr}(f)\) and \(\hat{X}_{pr \to pt}(f)\) are the discrete Fourier transform representations of the audio recorded at the reference position and the warped audio at the target position respectively. - **Optimization objective**: \[ F_\theta \leftarrow \arg \min_\theta L(\hat{X}_{p1 \to p2}(f), X_{p2}(f)), \quad \forall p_1, p_2 \in P \] In this way, SPEAR can effectively predict the spatial acoustic effects at any position, thus providing strong support for tasks such as immersive 3D audio experiences, robot relocalization, and manipulation.

SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field

A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging

Acoustic Field Visualization and Source Localization Via Physics-Informed Learning of Sparse Data with Adaptive Sampling

A Wearable Vision-To-Audio Sensory Substitution Device for Blind Assistance and the Correlated Neural Substrates

Mmear: Push the Limit of COTS Mmwave Eavesdropping on Headphones

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

Meta-Speaker: Acoustic Source Projection by Exploiting Air Nonlinearity

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Pushing the Limits of Acoustic Spatial Perception Via Incident Angle Encoding

EarSpeech: Exploring In-Ear Occlusion Effect on Earphones for Data-efficient Airborne Speech Enhancement

Mechatronic Generation of Datasets for Acoustics Research

Subspace Hybrid MVDR Beamforming for Augmented Hearing

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction

AcouRadar: Towards Single Source Based Acoustic Localization

Brain-controlled augmented hearing for spatially moving conversations in multi-talker environments

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers

Acoustic Volume Rendering for Neural Impulse Response Fields

Room-aware portable Auditory Augmented Reality: Real-time spatial audio generation with geometric data analysis

The Neural-SRP method for positional sound source localization