Harmonic Frequency-Separable Transformer for Instrument-Agnostic Music Transcription

Yulun Wu,Weixing Wei,Dichucheng Li,Mengbo Li,Yi Yu,Yongwei Gao,Wei Li
DOI: https://doi.org/10.1109/icme57554.2024.10688217
2024-01-01
Abstract:Automatic Music Transcription (AMT) aims to convert music audio into symbolic representations. Recently, transformer-based methods have been successfully applied to instrument-agnostic music transcription. This allows transcription models can no longer focus on specific characteristics for an instrument class. However, these transformer-based methods designs for AMT were mainly motivated by other research fields and uses additional large-scale datasets, without considering the intrinsic features and patterns of the music signals. In this paper, we propose the Harmonic Frequency-Separable Transformer (HFSFormer), providing effective prior information based on music knowledge for instrument-agnostic transcription. The HFSFormer can capture the harmonic structure of music and separate time-frequency representations to decouple multiple pitches and different timbres, which can better explicitly model the note’s onset/offset and pitch. Experimental results show that our proposed method outperforms state-of-the-art peers on public datasets while having an order of magnitude fewer parameters.
What problem does this paper attempt to address?