Improving Automatic Piano Transcription by Refined Feature Fusion and Weighted Loss

Zhao Jiahao,Wu Yulun,Wen Liang,Ma Lianhang,Ruan Linping,Wang Wantao,Li Wei
DOI: https://doi.org/10.1007/978-981-19-4703-2_4
2022-01-01
Abstract:Automatic Piano Transcription is to transcribe raw audio files into annotated piano rolls. In recent studies, jointly estimating pitch, onset, offset, and velocity of each note is commonly used. The previous state-of-the-art “Onsets and Frames” model chooses to concatenate the output of both onset and offset sub-tasks with extracted features to improve the frame-wise pitch detection, which is, however, low-efficiency in our opinion. In this paper, we proposed an improved piano transcription model by doing feature fusion and loss weighting. Our proposed model outperforms the baselines by a large margin. It also shows comparable performance with the state-of-the-art “High-Res PT” model in note metrics and outperforms it in frame metrics with an F1 score of 90.27%.
What problem does this paper attempt to address?