Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Yue Qiao,Vinay Kothapally,Meng Yu,Dong Yu

2024-09-16

Abstract:Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at <a class="link-external link-https" href="https://bridgoon97.github.io/NeuralAmbisonicEncoding/" rel="external noopener nofollow">this https URL</a>.

Audio and Speech Processing

What problem does this paper attempt to address?

The main objective of this paper is to improve the Ambisonic encoding method in multi-speaker scenarios, particularly when encoding full 3D Ambisonic signals using a horizontal circular microphone array. Specifically, the authors propose a deep learning-based approach to achieve this goal through a two-stage network architecture: 1. **Addressing the limitations of traditional methods**: Traditional Ambisonic encoding methods typically rely on spherical microphone arrays to efficiently capture sound field information, but this approach is not flexible enough in practical applications. Therefore, this paper aims to develop a new deep learning method to overcome these limitations. 2. **Introducing innovative techniques**: - **Two-stage network architecture**: Mimics the process of plane wave decomposition and Ambisonics synthesis, thereby better extracting spatial information from microphone signals. - **Loss function based on spatial power map**: Used to regularize the channel correlation between Ambisonic signals. - **Channel arrangement technique**: Addresses the ambiguity problem when encoding vertical information using a horizontal circular array. 3. **Performance improvement**: Through the evaluation of simulated speech and noise datasets, the proposed method is shown to significantly outperform existing traditional signal processing (SP) and deep learning (DL) methods in terms of audio quality (including timbre and spatial quality) and sound source localization accuracy. In summary, this paper aims to enhance the effectiveness of Ambisonic encoding in multi-speaker scenarios through deep learning techniques, especially when using a horizontal circular array.

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Neural Ambisonics encoding for compact irregular microphone arrays

Binaural Rendering of Ambisonic Signals by Neural Networks

Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction

Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

End-to-End Paired Ambisonic-Binaural Audio Rendering

Direction Specific Ambisonics Source Separation with End-To-End Deep Learning

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario

SpatialCodec: Neural Spatial Speech Coding

Efficient Algorithm and Localization Experiment on Spherical Microphone Array Recording and Binaural Rendering

Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

Towards a Real-Time Production of Immersive Spatial Audio of High Individuality with an RBF Neural Network

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Deep Learning-Enabled High-Resolution and Fast Sound Source Localization in Spherical Microphone Array System

Binaural Speech Enhancement Based On Dnn For The Application Of Virtual Reality

Delay-and-Sum Beamforming Based Spatial Mapping for Multi-Source Sound Localization

Pushing the Limits of Acoustic Spatial Perception Via Incident Angle Encoding

A circular microphone array with virtual microphones based on acoustics-informed neural networks

Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.