Abstract:A deep learning framework for dynamically rendering personal sound zones (PSZs) with head tracking is presented, utilizing a spatially adaptive neural network (SANN) that inputs listeners' head coordinates and outputs PSZ filter coefficients. The SANN model is trained using either simulated acoustic transfer functions (ATFs) with data augmentation for robustness in uncertain environments or a mix of simulated and measured ATFs for customization under known conditions. It is found that augmenting room reflections in the training data can more effectively improve the model robustness than augmenting the system imperfections, and that adding constraints such as filter compactness to the loss function does not significantly affect the model's performance. Comparisons of the best-performing model with traditional filter design methods show that, when no measured ATFs are available, the model yields equal or higher isolation in an actual room environment with fewer filter artifacts. Furthermore, the model achieves significant data compression (100x) and computational efficiency (10x) compared to the traditional methods, making it suitable for real-time rendering of PSZs that adapt to the listeners' head movements.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to achieve adaptive rendering in personal sound zones (PSZs) through a deep - learning framework in a dynamic environment. Specifically, it proposes a method based on the Spatially Adaptive Neural Network (SANN) to generate and update PSZ filter coefficients in real - time according to the listener's head position. This method can improve the isolation effect of PSZs, reduce filter artifacts in an actual room environment, and at the same time achieve significant data compression and improvement in computational efficiency. #### Main problems include: 1. **Limitations of static PSZs**: - Existing PSZ methods assume that PSZs are fixed spatial regions and cannot be dynamically adjusted as the listener moves. - In practical applications, such as in home entertainment scenarios, the listener may move freely in a large area, and the static PSZ setting may not be sufficient to meet the requirements. 2. **Deficiencies in traditional filter design methods**: - Traditional methods (such as Acoustic Contrast Control (ACC) and Pressure Matching (PM)) have limitations in dealing with uncertainty and complex environments. - The pre - calculated filter method requires a large amount of data storage and complex interpolation algorithms, while the real - time calculated filter faces high computational costs. 3. **System robustness and customization**: - How to ensure that the generated filters are still effective in the face of system uncertainties (such as room reflections, system errors, etc.). - How to use measurement data to customize the model to adapt to specific system conditions. #### Solutions proposed in the paper: - **SANN model**: By training a deep - learning model, input the listener's head coordinates and output the corresponding PSZ filter coefficients. This enables the PSZ to be updated in real - time according to the listener's head movement. - **Data augmentation and mixed training**: Train through simulated and measured Acoustic Transfer Functions (ATFs) to enhance the robustness and adaptability of the model. - **Optimized loss function**: Introduce multiple loss terms, including amplitude matching, energy minimization, gain limitation, and filter compactness, to ensure the quality and performance of the filters. Through these methods, the new framework proposed in the paper not only improves the isolation effect of PSZs but also achieves significant data compression and improvement in computational efficiency, making it suitable for real - time rendering scenarios.

SANN-PSZ: Spatially Adaptive Neural Network for Head-Tracked Personal Sound Zones

Acoustic Field Visualization and Source Localization Via Physics-Informed Learning of Sparse Data with Adaptive Sampling

Robust Fixed-Filter Sound Zone Control with Audio-Based Position Tracking

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

Sound Field Reconstruction Using a Compact Acoustics-informed Neural Network

A Distributed Algorithm for Personal Sound Zones Systems

Binaural Rendering of Ambisonic Signals by Neural Networks

Dynamic-Structured Reservoir Spiking Neural Network in Sound Localization

Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics

SpatialCodec: Neural Spatial Speech Coding

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

An Active Noise Control System Based on Soundfield Interpolation Using a Physics-informed Neural Network

Evaluating Spatial Hearing Using a Dual-Task Approach in a Virtual-Acoustics Environment

Spherical Convolutional Recurrent Neural Network for Real-Time Sound Source Tracking

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

SpaIn-Net: Spatially-Informed Stereophonic Music Source Separation

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

Spatially constrained vs. unconstrained filtering in neural spatiospectral filters for multichannel speech enhancement