SANN-PSZ: Spatially Adaptive Neural Network for Head-Tracked Personal Sound Zones

Yue Qiao,Edgar Choueiri
2024-11-02
Abstract:A deep learning framework for dynamically rendering personal sound zones (PSZs) with head tracking is presented, utilizing a spatially adaptive neural network (SANN) that inputs listeners' head coordinates and outputs PSZ filter coefficients. The SANN model is trained using either simulated acoustic transfer functions (ATFs) with data augmentation for robustness in uncertain environments or a mix of simulated and measured ATFs for customization under known conditions. It is found that augmenting room reflections in the training data can more effectively improve the model robustness than augmenting the system imperfections, and that adding constraints such as filter compactness to the loss function does not significantly affect the model's performance. Comparisons of the best-performing model with traditional filter design methods show that, when no measured ATFs are available, the model yields equal or higher isolation in an actual room environment with fewer filter artifacts. Furthermore, the model achieves significant data compression (100x) and computational efficiency (10x) compared to the traditional methods, making it suitable for real-time rendering of PSZs that adapt to the listeners' head movements.
Audio and Speech Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to achieve adaptive rendering in personal sound zones (PSZs) through a deep - learning framework in a dynamic environment. Specifically, it proposes a method based on the Spatially Adaptive Neural Network (SANN) to generate and update PSZ filter coefficients in real - time according to the listener's head position. This method can improve the isolation effect of PSZs, reduce filter artifacts in an actual room environment, and at the same time achieve significant data compression and improvement in computational efficiency. #### Main problems include: 1. **Limitations of static PSZs**: - Existing PSZ methods assume that PSZs are fixed spatial regions and cannot be dynamically adjusted as the listener moves. - In practical applications, such as in home entertainment scenarios, the listener may move freely in a large area, and the static PSZ setting may not be sufficient to meet the requirements. 2. **Deficiencies in traditional filter design methods**: - Traditional methods (such as Acoustic Contrast Control (ACC) and Pressure Matching (PM)) have limitations in dealing with uncertainty and complex environments. - The pre - calculated filter method requires a large amount of data storage and complex interpolation algorithms, while the real - time calculated filter faces high computational costs. 3. **System robustness and customization**: - How to ensure that the generated filters are still effective in the face of system uncertainties (such as room reflections, system errors, etc.). - How to use measurement data to customize the model to adapt to specific system conditions. #### Solutions proposed in the paper: - **SANN model**: By training a deep - learning model, input the listener's head coordinates and output the corresponding PSZ filter coefficients. This enables the PSZ to be updated in real - time according to the listener's head movement. - **Data augmentation and mixed training**: Train through simulated and measured Acoustic Transfer Functions (ATFs) to enhance the robustness and adaptability of the model. - **Optimized loss function**: Introduce multiple loss terms, including amplitude matching, energy minimization, gain limitation, and filter compactness, to ensure the quality and performance of the filters. Through these methods, the new framework proposed in the paper not only improves the isolation effect of PSZs but also achieves significant data compression and improvement in computational efficiency, making it suitable for real - time rendering scenarios.