Vocal Melody Extraction Via Dnn-Based Pitch Estimation And Salience-Based Pitch Refinement

Yongwei Gao,Bilei Zhu,Wei Li,Ke Li,Yongjian Wu,Feiyue Huang
DOI: https://doi.org/10.1109/icassp.2019.8683608
2019-01-01
Abstract:Data-driven methods for melody extraction from polyphonic music generally require large amounts of labeled data for model training. However, musical data with annotations of melody fundamental frequency (F0) are rare and hard to obtain. To overcome this limitation, in this paper we propose to use melody MIDI files, which are more massively available, as the sources of labels to train a deep neural network (DNN) model for melody extraction. For each testing audio, the pitch sequence estimated by DNN is comprised of note numbers quantized at semitone level, and their resolution is relatively low. Therefore, we further propose a salience-based method to refine the pitch estimate of DNN to a higher resolution of 10 cents. Experimental results on three public datasets indicate that our method outperforms four state-of-the-art melody extraction methods in most cases.
What problem does this paper attempt to address?