Universal Spatial Audio Transcoder

Amaia Sagasti,Davide Scaini,Daniel Arteaga
2024-05-15
Abstract:This paper addresses the challenges associated with both the conversion between different spatial audio formats and the decoding of a spatial audio format to a specific loudspeaker layout. Existing approaches often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To overcome these challenges, we present the Universal Spatial Audio Transcoder (USAT) method and its corresponding open source implementation. USAT generates an optimal decoder or transcoder for any input spatial audio format, adapting it to any output format or 2D/3D loudspeaker configuration. Drawing upon optimization techniques based on psychoacoustic principles, the algorithm maximizes the preservation of spatial information. We present examples of the decoding and transcoding of several audio formats, and show that USAT approach is advantageous compared to the most common methods in the field.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the challenges faced in transcoding between spatial audio formats and decoding spatial audio formats to specific speaker layouts. Existing methods often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To tackle these issues, the authors propose the Universal Spatial Audio Transcoder (USAT) method and its open-source implementation. USAT can generate optimal decoders or transcoders for any input spatial audio format and adapt to any output format or 2D/3D speaker configuration. By using optimization techniques based on psychoacoustic principles, the algorithm maximizes the retention of spatial information. ### Summary of Main Content - **Background Introduction**: The paper first introduces various formats of spatial audio, including layout-independent methods (such as Ambisonics) and layout-specific encoding formats (such as traditional multichannel configurations). The paper points out that in real-world scenarios, playback setups often differ from the intended setups, requiring adaptation to maintain spatial information and overall auditory quality. - **Limitations of Existing Methods**: Although existing methods can achieve format conversion, they may not be optimal from a psychoacoustic perspective. - **USAT Method**: A new method, USAT, is proposed, which can optimize the decoding and transcoding process of spatial audio based on psychoacoustic principles. - **Experimental Validation**: The paper demonstrates the application of USAT in decoding and transcoding various audio formats and shows that the USAT method outperforms existing methods. ### Specific Implementation - **Algorithm Steps**: 1. **Matrix Initialization**: Calculate or initialize the encoding matrix, transcoding matrix, and decoding-to-speaker matrix. 2. **Cost Function Setup**: Establish a cost function based on psychoacoustic principles. 3. **Cost Function Minimization**: Minimize the cost function by optimizing the transcoding matrix. ### Experimental Results - **Case Studies**: 1. **5OA Decoding to 7.0.4**: USAT outperforms the existing AllRad method on multiple metrics. 2. **7.0.4 Transcoding to 5OA**: USAT performs better in terms of pressure level and ASW. 3. **5.0.2 Decoding to Irregular 3.0.1**: USAT outperforms direct channel remapping methods on all three metrics. 4. **Audio Object Decoding to 5.0**: Demonstrates the advantages of USAT in handling audio objects. Through these experimental results, the paper proves the effectiveness and superiority of the USAT method.