State-Space Estimation of Spatially Dynamic Room Impulse Responses using a Room Acoustic Model-based Prior

Kathleen MacWilliam,Thomas Dietzen,Randall Ali,Toon van Waterschoot
DOI: https://doi.org/10.3389/frsip.2024.1426082
2024-11-13
Abstract:The estimation of room impulse responses (RIRs) between static loudspeaker and microphone locations can be done using a number of well-established measurement and inference procedures. While these procedures assume a time-invariant acoustic system, time variations need to be considered for the case of spatially dynamic scenarios where loudspeakers and microphones are subject to movement. If the RIR is modeled using image sources, then movement implies that the distance to each image source varies over time, making the estimation of the spatially dynamic RIR particularly challenging. In this paper, we propose a procedure to estimate the early part of the spatially dynamic RIR between a stationary source and a microphone moving on a linear trajectory at constant velocity. The procedure is built upon a state-space model, where the state to be estimated represents the early RIR, the observation corresponds to a microphone recording in a spatially dynamic scenario, and time-varying distances to the image sources are incorporated into the state transition matrix obtained from static RIRs at the start and end point of the trajectory. The performance of the proposed approach is evaluated against state-of-the-art RIR interpolation and state-space estimation methods using simulations, demonstrating the potential of the proposed state-space model.
Audio and Speech Processing,Signal Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of how to accurately estimate the spatially - dynamic room impulse response (RIR) in the early part in time - varying acoustic scenes, especially when the sound source is fixed and the microphone moves along a straight - line trajectory at a constant speed. Specifically, the paper focuses on how to estimate the early RIR through the state - space model in time - varying systems. ### Background and challenges 1. **RIR estimation in static scenes**: - In static scenes, i.e., when the positions of the sound source and the microphone are fixed, the RIR can be estimated by a variety of established measurement and inference methods. - These methods assume that the acoustic system is time - invariant, i.e., the RIR is the time - domain representation of a linear time - invariant (LTI) system. 2. **Challenges in dynamic scenes**: - In dynamic scenes, the position of the sound source or the microphone changes, causing the RIR to vary with time. - If the image source model (ISM) is used to model the RIR, the distance of each image source will change with time, making the estimation of the dynamic RIR very challenging. ### Solutions 1. **State - space model**: - This paper proposes a method based on the state - space model, where the state to be estimated represents the early RIR. - The observations correspond to the microphone recordings in dynamic scenes, and the state - transition matrix combines the static RIRs at the starting and ending points of the trajectory. 2. **State - transition matrix**: - The state - transition matrix is derived from the static RIR through the image source model (ISM), taking into account the time - varying distances. - This matrix is used to model the changes of the early RIR between different positions. 3. **Performance evaluation**: - Through simulation, the proposed method is compared with the existing RIR interpolation and state - space estimation methods, demonstrating the potential advantages of the proposed method. ### Key technical points - **State - space model**: Used to describe the process of RIR changing over time. - **Image source model (ISM)**: Used to model the reflection paths in the RIR. - **Dynamic time warping (DTW)**: Used to align and interpolate the early reflections in the RIR. ### Conclusion The method proposed in this paper can estimate the early RIR more accurately in dynamic scenes and has better performance compared to the existing interpolation and state - space estimation methods. This provides new solutions for applications such as virtual acoustic environments and echo cancellation.