Diff-MSTC: A Mixing Style Transfer Prototype for Cubase

Soumya Sai Vanka,Lennart Hannink,Jean-Baptiste Rolland,George Fazekas
2024-11-11
Abstract:In our demo, participants are invited to explore the Diff-MSTC prototype, which integrates the Diff-MST model into Steinberg's digital audio workstation (DAW), Cubase. Diff-MST, a deep learning model for mixing style transfer, forecasts mixing console parameters for tracks using a reference song. The system processes up to 20 raw tracks along with a reference song to predict mixing console parameters that can be used to create an initial mix. Users have the option to manually adjust these parameters further for greater control. In contrast to earlier deep learning systems that are limited to research ideas, Diff-MSTC is a first-of-its-kind prototype integrated into a DAW. This integration facilitates mixing decisions on multitracks and lets users input context through a reference song, followed by fine-tuning of audio effects in a traditional manner.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
This paper attempts to address the problem of how to achieve automation and intelligence in multi-track music mixing through deep learning technology in music production. Specifically, the paper introduces a prototype system called Diff-MSTC, which integrates the Diff-MST model into Steinberg's digital audio workstation (DAW) Cubase to achieve reference song-based mixing style transfer. ### Main Issues: 1. **Automation and Controllability**: Existing automatic mixing systems often lack user control, while professional mixing engineers need to find a balance between automation and manual adjustments. Diff-MSTC aims to provide a system that is both automated and manually adjustable. 2. **Context Awareness**: Traditional automatic mixing systems often overlook the importance of context in the mixing process. Diff-MSTC introduces reference songs to enable the system to understand the user's intent and generate initial mixing parameters accordingly. 3. **Practical Application**: Although many deep learning-based mixing systems have been proposed, few systems can operate in an actual DAW environment. Diff-MSTC is a prototype system integrated into Cubase, allowing for testing and application in real workflows. ### Solution: - **Diff-MST Model**: This model uses deep learning technology to predict the parameters of the mixing console, which can be used to generate the initial mix. The model provides contextual information through reference songs to better understand the user's intent. - **Integration into Cubase**: Diff-MSTC, as a plugin for Cubase, allows users to use the system in an actual music production environment. Users can select segments of reference songs and choose multi-track audio from the project, and the system will generate corresponding mixing parameters. - **User Interface**: The system provides a user-friendly interface where users can select reference songs, choose audio segments, and view and adjust the generated mixing parameters. ### Target Users: - **Amateurs**: Those who wish to obtain high-quality automatic mixing results. - **Professional Users**: Those who wish to make further manual adjustments on the basis of automation to improve work efficiency. Through these methods, Diff-MSTC aims to provide an efficient and flexible mixing tool for users of different skill levels.