Using perceptive subbands analysis to perform audio scenes cartography

Laurent Millot,Gérard Pelé,Mohammed Elliq
2024-01-05
Abstract:Audio scene cartography for real or simulated stereo recordings is presented. This audio scene analysis is performed doing successively: a perceptive 10-subbands analysis, calculation of temporal laws for relative delays and gains between both channels of each subband using a short-time cons\-tant scene assumption and channels inter-correlation which permit to follow a mobile source in its moves, calculation of global and subbands histograms whose peaks give the incidence information for fixed sources. Audio scenes composed of 2 to 4 fixed sources or with a fixed source and a mobile one have been already successfully tested. Further extensions and applications will be discussed. Audio illustrations of audio scenes, subband analysis and demonstration of real-time stereo recording simulations will be given.Paper 6340 presented at the 118th Convention of the Audio Engineering Society, Barcelona, 2005
Audio and Speech Processing,Sound,Signal Processing,Classical Physics
What problem does this paper attempt to address?
The paper primarily explores a method for audio scene cartography, aiming to determine the number of sound sources and their potential locations in an audio scene through perceptual sub-band analysis of stereo recordings. Specifically, the paper addresses the following key issues: 1. **Perceptual Sub-band Analysis**: The audio signal is analyzed using 10 sub-bands of specific frequency ranges. This method captures information from different frequency components and aids in subsequent sound source localization. 2. **Sound Source Localization**: By analyzing the relative delay (∆t) and attenuation (∆E) between the left and right channels within each sub-band, researchers can estimate the location information of sound sources. This method is particularly effective for determining the position of static sound sources. 3. **Dynamic Sound Source Tracking**: For moving sound sources, the paper proposes a method to extract the temporal variation characteristics of moving sound sources, such as tracking the movement of the sound source by analyzing the changes in relative delay and attenuation over time within a sub-band. 4. **Application to Multi-channel Recordings**: Although the paper mainly focuses on the analysis of stereo recordings, the authors also mention the possibility of extending this method to multi-channel recordings. 5. **Practical Application Cases**: The paper demonstrates the effectiveness of the proposed method through three specific examples, including a synthetic scene with static and moving sound sources, a synthetic scene of a jazz quartet, and a real scene recorded at a train station. Through the above methods, researchers can identify the main sound sources in complex audio environments and achieve good localization results for static sound sources. For moving sound sources, although it is more challenging to directly obtain location information from relative delay and attenuation, further analysis can still yield useful results. Additionally, the paper discusses future research directions, including establishing a trajectory library to assist in sound source localization, improving microphone models, and developing new multi-channel recording simulators.