The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing

Christopher Tralie,Ben Cantil
2024-11-07
Abstract:We present ``The Concatenator,'' a real time system for audio-guided concatenative synthesis. Similarly to Driedger et al.'s ``musaicing'' (or ``audio mosaicing'') technique, we concatenate a set number of windows within a corpus of audio to re-create the harmonic and percussive aspects of a target audio stream. Unlike Driedger's NMF-based technique, however, we instead use an explicitly Bayesian point of view, where corpus window indices are hidden states and the target audio stream is an observation. We use a particle filter to infer the best hidden corpus states in real-time. Our transition model includes a tunable parameter to control the time-continuity of corpus grains, and our observation model allows users to prioritize how quickly windows change to match the target. Because the computational complexity of the system is independent of the corpus size, our system scales to corpora that are hours long, which is an important feature in the age of vast audio data collections. Within The Concatenator module itself, composers can vary grain length, fit to target, and pitch shift in real time while reacting to the sounds they hear, enabling them to rapidly iterate ideas. To conclude our work, we evaluate our system with extensive quantitative tests of the effects of parameters, as well as a qualitative evaluation with artistic insights. Based on the quality of the results, we believe the real-time capability unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.
Sound,Information Retrieval,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? The paper "The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing" aims to solve several key challenges in real - time audio - guided concatenative synthesis. Specifically, it attempts to solve the following problems: 1. **Real - time performance**: - The paper proposes a system named "Concatenator" that is capable of performing audio - guided concatenative synthesis in a real - time environment. Unlike traditional non - negative matrix factorization (NMF) - based methods, this method does not require pre - training and can adapt to any audio corpus at runtime. 2. **Large - scale audio data processing**: - Modern music producers are faced with a large amount of audio data, which may come from cloud services, multi - sample libraries, or their own multi - track recordings, etc. The method proposed in the paper can handle audio corpora that are several hours long without significantly increasing the computational complexity. 3. **Enhancing musical expression and control**: - By introducing Bayesian inference and particle filters, the Concatenator system allows users to adjust the granularity length, match the target track, and perform pitch conversion in a real - time environment. This provides new musical expression methods and control means for music creators, and is especially suitable for live performances and modular synthesis integration. 4. **Maintaining timbre characteristics**: - When pursuing better spectral fitting, traditional methods may lose the timbre characteristics of the original audio fragments. The method proposed in the paper achieves reasonable spectral fitting while maintaining timbre characteristics by adjusting parameters (such as the time continuity parameter \( p_d \) and the temperature parameter \( \tau \)). 5. **Optimizing computational efficiency**: - The algorithm proposed in the paper is independent of the corpus size in terms of computational complexity, thus enabling effective processing of large - scale corpora. For example, for a 60 - minute corpus, using 1000 particles and 5 windows per particle, the speed is nearly 30 times faster compared to Driedger's method. ### Formula summary - **KL - divergence loss function**: \[ D(V \| WH) = \sum_{m,t} V_{mt} \odot \log\left(\frac{V_{mt}}{(WH)_{mt}}\right) - V_{mt} + (WH)_{mt} \] where \( V \) is the target spectrogram, \( W \) is the corpus spectral template, and \( H \) is the learned activation matrix. - **Update rule**: \[ H^\ell_{kt} = H^{\ell - 1}_{kt} \left( \frac{\sum_m W_{mk} V_{mt} / (WH^{\ell - 1})_{mt}}{\sum_m W_{mk}} \right) \] - **State transition probability**: \[ p_T(\vec{s}_t = \vec{b} | \vec{s}_{t - 1} = \vec{a}) = \prod_{k = 0}^{p - 1} \begin{cases} p_d & \text{if } b[k] = a[k] + 1 \\ \frac{1 - p_d}{N - 1} & \text{otherwise} \end{cases} \] - **Observation probability**: \[ p_O[i] = \frac{e^{-\tau D_i}}{\sum_j e^{-\tau D_j}} \] where \( D_i \) is the KL loss between the spectral approximation \( \vec{\Lambda}_i \) of the \( i \) - th particle and the target \( \vec{v}_t \). Through these improvements, the Concatenator system not only improves the real - time performance but also has better performance in other aspects.