Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

Julian Wechsler,Srikanth Raj Chetupalli,Mhd Modar Halimeh,Oliver Thiergart,Emanuël A. P. Habets
2024-09-20
Abstract:Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use a small number of microphone arrays to achieve audio signal capture with a specific directivity pattern, especially in far - field conditions. Traditional methods usually rely on explicit signal models or parameterized sound field models, and these methods may perform poorly in complex acoustic environments. In addition, existing methods often lack fine - grained control over the directivity pattern and it is difficult to achieve high - order directivity patterns with a small number of microphones. For this reason, this research proposes a directional filtering method based on deep neural networks (DNN), which approximates the directivity pattern of virtual directional microphones (VDM) by implicitly learning data. Specifically, this method uses DNN to estimate a single - channel complex mask from the microphone array signals and applies it to the reference microphone to generate a signal with the desired directivity pattern. This method can not only achieve the desired directivity pattern, but also achieve high - order directivity patterns when using a small number of microphones, which is difficult for linear and parameterized directional filtering methods to do. ### Main contributions 1. **Problem formalization**: Formalize the problem of learning the directivity pattern. 2. **Research on the composition of the training data set**: Explore the composition of the training data set and its impact on the directivity pattern achieved during DNN inference. 3. **Reduction in the need for the number of microphones**: Demonstrate that the DNN method can achieve the desired directivity pattern with fewer microphones, surpassing classical signal processing methods. ### Method overview - **Architecture**: Adopt the FT - JNF architecture, model the spectral - spatial relationship through a bidirectional LSTM layer, then model the temporal relationship through a unidirectional LSTM layer, and finally calculate the complex - valued single - channel mask through a linear layer. - **Loss function**: Use the source - aggregated and regularized threshold signal - to - distortion ratio (SA - ε - tSDR) as the loss function to ensure the stability and effectiveness of training. - **Training strategy**: Construct a training data set by simulating multi - speaker scenarios and densely sampling the positions of the desired directivity patterns. ### Experimental setup and results - **Experimental setup**: Use the LibriSpeech database to generate training, validation, and test sets and simulate different numbers of speaker scenarios. - **Performance evaluation**: Evaluate the performance of the method through the signal - to - distortion ratio (SDR), and the results show that the proposed method is significantly better than the baseline method, especially in multi - speaker scenarios. In summary, this research aims to explore the feasibility of using DNN for directivity pattern learning, especially for time - invariant patterns, and demonstrates its superior performance in far - field conditions.