Abstract:Sound propagation is the process by which sound energy travels through a medium, such as air, to the surrounding environment as sound waves. The room impulse response (RIR) describes this process and is influenced by the positions of the source and listener, the room's geometry, and its materials. Physics-based acoustic simulators have been used for decades to compute accurate RIRs for specific acoustic environments. However, we have encountered limitations with existing acoustic simulators. To address these limitations, we propose three novel solutions. First, we introduce a learning-based RIR generator that is two orders of magnitude faster than an interactive ray-tracing simulator. Our approach can be trained to input both statistical and traditional parameters directly, and it can generate both monaural and binaural RIRs for both reconstructed and synthetic 3D scenes. Our generated RIRs outperform interactive ray-tracing simulators in speech-processing applications, including ASR, Speech Enhancement, and Speech Separation. Secondly, we propose estimating RIRs from reverberant speech signals and visual cues without a 3D representation of the environment. By estimating RIRs from reverberant speech, we can augment training data to match test data, improving the word error rate of the ASR system. Our estimated RIRs achieve a 6.9% improvement over previous learning-based RIR estimators in far-field ASR tasks. We demonstrate that our audio-visual RIR estimator aids tasks like visual acoustic matching, novel-view acoustic synthesis, and voice dubbing, validated through perceptual evaluation. Finally, we introduce IR-GAN to augment accurate RIRs using real RIRs. IR-GAN parametrically controls acoustic parameters learned from real RIRs to generate new RIRs that imitate different acoustic environments, outperforming Ray-tracing simulators on the far-field ASR benchmark by 8.95%.

NeuralSound

Deep-Modal: Real-Time Impact Sound Synthesis for Arbitrary Shapes

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Scene-Aware Audio Rendering via Deep Acoustic Analysis

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks

Sound wave neural network based on partial differential equation

Physics-informed neural networks for one-dimensional sound field predictions with parameterized sources and impedance boundaries

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

Sound Field Estimation around a Rigid Sphere with Physics-informed Neural Network

Acoustic Volume Rendering for Neural Impulse Response Fields

Physics-Informed Neural Network for Volumetric Sound field Reconstruction of Speech Signals

Physics-inspired Neuroacoustic Computing Based on Tunable Nonlinear Multiple-scattering

Points2Sound: From mono to binaural audio using 3D point cloud scenes

Synthesis of Soundfields through Irregular Loudspeaker Arrays Based on Convolutional Neural Networks

Interactive Neural Resonators

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

Sound field reconstruction using neural processes with dynamic kernels

Efficient learning-based sound propagation for virtual and real-world audio processing applications