Multi-task Melody Extraction Using Feature Optimization and CRNN-CRF.

Hongmei Li,Lihua Tian,Chen Li
DOI: https://doi.org/10.1016/j.compeleceng.2023.108605
IF: 4.152
2023-01-01
Computers & Electrical Engineering
Abstract:At present, mainstream melody extraction mostly uses deep learning methods, but there are still problems: such as incomplete network architecture, lack of research on the importance of input features for melody extraction, etc. Based on the previous issues, to predict the melody more accurately, we firstly use phase correction after short-time Fourier transform (STFT) feature extraction method according to audio features. Secondly, we use a weight-based and multi-task network feature optimization mechanism to assign weights to the input features of the main and auxiliary networks, to fully capture the important parts in the time sequence of the melodic. Thirdly, to make use of the context information of features and label sequences, we further propose a multi-task pitch extraction network based on Convolutional Recurrent Neural Network-Conditional Random Field (CRNN-CRF) to decode the optimal label sequences. Besides, we use the joint loss function of the previous two tasks to train and optimize the model to improve the accuracy of pitch prediction. After testing and evaluating the various datasets before visual comparison, the experimental results have demonstrated the superiority of our proposed method.
What problem does this paper attempt to address?