Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Yue Qiao,Vinay Kothapally,Meng Yu,Dong Yu
2024-09-16
Abstract:Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at <a class="link-external link-https" href="https://bridgoon97.github.io/NeuralAmbisonicEncoding/" rel="external noopener nofollow">this https URL</a>.
Audio and Speech Processing
What problem does this paper attempt to address?
The main objective of this paper is to improve the Ambisonic encoding method in multi-speaker scenarios, particularly when encoding full 3D Ambisonic signals using a horizontal circular microphone array. Specifically, the authors propose a deep learning-based approach to achieve this goal through a two-stage network architecture: 1. **Addressing the limitations of traditional methods**: Traditional Ambisonic encoding methods typically rely on spherical microphone arrays to efficiently capture sound field information, but this approach is not flexible enough in practical applications. Therefore, this paper aims to develop a new deep learning method to overcome these limitations. 2. **Introducing innovative techniques**: - **Two-stage network architecture**: Mimics the process of plane wave decomposition and Ambisonics synthesis, thereby better extracting spatial information from microphone signals. - **Loss function based on spatial power map**: Used to regularize the channel correlation between Ambisonic signals. - **Channel arrangement technique**: Addresses the ambiguity problem when encoding vertical information using a horizontal circular array. 3. **Performance improvement**: Through the evaluation of simulated speech and noise datasets, the proposed method is shown to significantly outperform existing traditional signal processing (SP) and deep learning (DL) methods in terms of audio quality (including timbre and spatial quality) and sound source localization accuracy. In summary, this paper aims to enhance the effectiveness of Ambisonic encoding in multi-speaker scenarios through deep learning techniques, especially when using a horizontal circular array.