Abstract:Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use a small number of microphone arrays to achieve audio signal capture with a specific directivity pattern, especially in far - field conditions. Traditional methods usually rely on explicit signal models or parameterized sound field models, and these methods may perform poorly in complex acoustic environments. In addition, existing methods often lack fine - grained control over the directivity pattern and it is difficult to achieve high - order directivity patterns with a small number of microphones. For this reason, this research proposes a directional filtering method based on deep neural networks (DNN), which approximates the directivity pattern of virtual directional microphones (VDM) by implicitly learning data. Specifically, this method uses DNN to estimate a single - channel complex mask from the microphone array signals and applies it to the reference microphone to generate a signal with the desired directivity pattern. This method can not only achieve the desired directivity pattern, but also achieve high - order directivity patterns when using a small number of microphones, which is difficult for linear and parameterized directional filtering methods to do. ### Main contributions 1. **Problem formalization**: Formalize the problem of learning the directivity pattern. 2. **Research on the composition of the training data set**: Explore the composition of the training data set and its impact on the directivity pattern achieved during DNN inference. 3. **Reduction in the need for the number of microphones**: Demonstrate that the DNN method can achieve the desired directivity pattern with fewer microphones, surpassing classical signal processing methods. ### Method overview - **Architecture**: Adopt the FT - JNF architecture, model the spectral - spatial relationship through a bidirectional LSTM layer, then model the temporal relationship through a unidirectional LSTM layer, and finally calculate the complex - valued single - channel mask through a linear layer. - **Loss function**: Use the source - aggregated and regularized threshold signal - to - distortion ratio (SA - ε - tSDR) as the loss function to ensure the stability and effectiveness of training. - **Training strategy**: Construct a training data set by simulating multi - speaker scenarios and densely sampling the positions of the desired directivity patterns. ### Experimental setup and results - **Experimental setup**: Use the LibriSpeech database to generate training, validation, and test sets and simulate different numbers of speaker scenarios. - **Performance evaluation**: Evaluate the performance of the method through the signal - to - distortion ratio (SDR), and the results show that the proposed method is significantly better than the baseline method, especially in multi - speaker scenarios. In summary, this research aims to explore the feasibility of using DNN for directivity pattern learning, especially for time - invariant patterns, and demonstrates its superior performance in far - field conditions.

Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

Microphone array processing via joint wideband angle-of-arrival estimation and speech feature enhancement

Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information

Neural Ambisonics encoding for compact irregular microphone arrays

Reconstructing the Dynamic Directivity of Unconstrained Speech

Optimal Two-Layer Directive Microphone Array with Application in Near-Field Acoustical Holography.

A High-Resolution and Low-Frequency Acoustic Beamforming Based on Bayesian Inference and Non-Synchronous Measurements

ACP1–ADA1 interaction in type 2 diabetes: a study in coronary artery disease

Reactive Near-Field to 3-Meter Far-Field Transformation Based on Deep Convolutional Neural Networks and Plane Wave Spectrum

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

On Directivity of A Circular Array with Directional Microphones

Speaker localization using direct path dominance test based on sound field directivity

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

A circular microphone array with virtual microphones based on acoustics-informed neural networks

Microphone Array Generalization for Multichannel Narrowband Deep Speech Enhancement

Inference-Adaptive Neural Steering for Real-Time Area-Based Sound Source Separation

Study on the Directing Performance of the Linear Microphone Array

Acoustic source imaging using densely connected convolutional networks

All Neural Low-latency Directional Speech Extraction