Music Performance Style Transfer for Learning Expressive Musical Performance

Zhe Xiao,Xin Chen,Li Zhou
DOI: https://doi.org/10.1007/s11760-023-02788-5
IF: 1.583
2024-01-01
Signal Image and Video Processing
Abstract:Generating expressive musical performance (EMP) is a hot issue in the field of music generation. Music played by humans is always more expressive than music produced by machines. To figure this out, it is crucial to explore the role of human performance in the production of music. This paper proposes a performance style transfer model to learn human performance style and implement EMP system. Our model is implemented using generative adversarial networks (GANs), with a multi-channel image composed of four elaborated spectrograms serving as the input to decompose and reconstruct music audio. To ensure training stability, we have designed a multi-channel consistency loss for GANs. Furthermore, given the lack of objective evaluation criteria for music generation, we propose a hybrid evaluation method that combines qualitative and quantitative methods to evaluate human-needs satisfaction. Three quantitative criteria are proposed at the feature and audio levels, respectively. The effectiveness of our method is verified on a public dataset through objective evaluation, which demonstrates its comparability to state-of-the-art algorithms. Additionally, subjective evaluations are conducted through visual analyses of both audio content and style. Finally, we conduct a musical Turing test in which subjects score the performance of the generated music. A series of experimental results show that our method is very competitive.
What problem does this paper attempt to address?