LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

Siting Xu,Yunlong Tang,Feng Zheng
2023-07-23
Abstract:Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at <a class="link-external link-https" href="https://github.com/yunlong10/LaunchpadGPT/" rel="external noopener nofollow">this https URL</a>.
Sound,Computation and Language,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to automatically generate Launchpad music visualization designs to assist and inspire beginners to create music visualizations using this instrument. Specifically, the author proposes a language - model - based method - LaunchpadGPT, which can automatically generate Launchpad lighting - effect videos that match the input music. This method aims to simplify the design process of music visualization, enabling even beginners to easily create high - quality music visualization works. ### Main contributions of the paper 1. **Propose LaunchpadGPT**: This is a language model based on the Generative Pretrained Transformer (GPT) for automatically generating Launchpad music visualizations according to the given music. 2. **Construct a dataset**: The author collected 16 Launchpad performance videos and extracted the music and corresponding video frames from them to construct a dataset of prompt - completion pairs for training the model. ### Method overview 1. **Feature extraction**: - Extract Mel - Frequency Cepstral Coefficients (MFCC) features from the music. - Extract the color information (RGB values) and coordinates (X) of each button from the video frames. 2. **Prompt - completion pair construction**: - Convert MFCC features into text form as a prompt. - Convert RGB - X tuples into text form as a completion. 3. **Model training**: - Use the NanoGPT model for training and update the model parameters through the teacher - forcing method. 4. **Inference stage**: - Input the MFCC features of the music, generate the corresponding RGB - X tuples, and then generate video frames. ### Experimental results 1. **Dataset**: - Collected 16 Launchpad performance videos with a total duration of 3312 seconds and approximately 82800 frames in total. 2. **Evaluation metrics**: - Use the Fréchet Video Distance (FVD) metric to evaluate the quality of the generated music visualization videos. 3. **Quantitative experiments**: - Compared with the random generation methods (Random - RGB and Random - RGBX), the videos generated by LaunchpadGPT have lower FVD scores, indicating that the videos it generates are closer to those manually made by humans. 4. **Result visualization**: - The generated videos show that LaunchpadGPT can generate color - coordinated visual effects that are synchronized with the music, while random methods cannot achieve this effect. ### Conclusion The LaunchpadGPT model proposed by the author performs well in generating music visualizations and can automatically generate high - quality visual effects synchronized with the music. This not only simplifies the creation process of music visualization but also provides new possibilities for fields such as music game design, concert LED screen design, and music dance floor lighting schemes.