Abstract:In our demo, participants are invited to explore the Diff-MSTC prototype, which integrates the Diff-MST model into Steinberg's digital audio workstation (DAW), Cubase. Diff-MST, a deep learning model for mixing style transfer, forecasts mixing console parameters for tracks using a reference song. The system processes up to 20 raw tracks along with a reference song to predict mixing console parameters that can be used to create an initial mix. Users have the option to manually adjust these parameters further for greater control. In contrast to earlier deep learning systems that are limited to research ideas, Diff-MSTC is a first-of-its-kind prototype integrated into a DAW. This integration facilitates mixing decisions on multitracks and lets users input context through a reference song, followed by fine-tuning of audio effects in a traditional manner.

What problem does this paper attempt to address?

This paper attempts to address the problem of how to achieve automation and intelligence in multi-track music mixing through deep learning technology in music production. Specifically, the paper introduces a prototype system called Diff-MSTC, which integrates the Diff-MST model into Steinberg's digital audio workstation (DAW) Cubase to achieve reference song-based mixing style transfer. ### Main Issues: 1. **Automation and Controllability**: Existing automatic mixing systems often lack user control, while professional mixing engineers need to find a balance between automation and manual adjustments. Diff-MSTC aims to provide a system that is both automated and manually adjustable. 2. **Context Awareness**: Traditional automatic mixing systems often overlook the importance of context in the mixing process. Diff-MSTC introduces reference songs to enable the system to understand the user's intent and generate initial mixing parameters accordingly. 3. **Practical Application**: Although many deep learning-based mixing systems have been proposed, few systems can operate in an actual DAW environment. Diff-MSTC is a prototype system integrated into Cubase, allowing for testing and application in real workflows. ### Solution: - **Diff-MST Model**: This model uses deep learning technology to predict the parameters of the mixing console, which can be used to generate the initial mix. The model provides contextual information through reference songs to better understand the user's intent. - **Integration into Cubase**: Diff-MSTC, as a plugin for Cubase, allows users to use the system in an actual music production environment. Users can select segments of reference songs and choose multi-track audio from the project, and the system will generate corresponding mixing parameters. - **User Interface**: The system provides a user-friendly interface where users can select reference songs, choose audio segments, and view and adjust the generated mixing parameters. ### Target Users: - **Amateurs**: Those who wish to obtain high-quality automatic mixing results. - **Professional Users**: Those who wish to make further manual adjustments on the basis of automation to improve work efficiency. Through these methods, Diff-MSTC aims to provide an efficient and flexible mixing tool for users of different skill levels.

Diff-MSTC: A Mixing Style Transfer Prototype for Cubase

Diff-MST: Differentiable Mixing Style Transfer

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Style Transfer for Non-differentiable Audio Effects

Combining audio control and style transfer using latent diffusion

Style Transfer of Audio Effects with Differentiable Signal Processing

Style Mixer: Semantic-aware Multi-Style Transfer Network

ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

Music Style Transfer With Diffusion Model

FM Tone Transfer with Envelope Learning

Music Style Transfer with Time-Varying Inversion of Diffusion Models

Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer

MagicMix: Semantic Mixing with Diffusion Models

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience

ASM: Audio Spectrogram Mixer

Audio Mixing Inversion Via Embodied Self-supervised Learning

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control

A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

Musical timbre style transfer with diffusion model