SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

Halidanmu Abudukelimu,Jishang Chen,Yunze Liang,Abudukelimu Abulizi,Alimujiang Yasen
DOI: https://doi.org/10.1007/s10489-024-05335-y
IF: 5.3
2024-03-21
Applied Intelligence
Abstract:In this study, we explore to address challenges related to incorrect scores, inconsistent rhythm and labeling in the generation of symbolic music scores, with a focus on the utilization of self-supervised models. We present the SymforNet model for symbolic music generation, which is based on self-supervision and deep learning. The model incorporates an attention mechanism and demonstrates exceptional proficiency in recognizing contextual elements of various categories. Experimental results indicate that: (1) The SymforNet model achieve an impressive 88% accuracy in generating music score; (2) In both the training and test sets, the SymforNet model exhibits significantly superior loss values, surpassing the all baseline models; (3) An examination of the multi-track Used Pitch Class data reveals that the SymforNet model, particularly in the context of sequences comprising three to four tracks, displays a strong correlation; (4) By comparing about the quality of music scores, SymforNet has a 87% rate of generating correct scores.
computer science, artificial intelligence
What problem does this paper attempt to address?