Learning optimal features for music transcription

Huaiping Ming,Dong-Yan Huang,Lei Xie,Haizhou Li
DOI: https://doi.org/10.1109/ChinaSIP.2014.6889211
2014-01-01
Abstract:This paper aims to design time-frequency representation (TFR) functions for automatic music transcription. It is desirable that the decomposition of those TFR functions are suitable for notes having variation of both pitch and spectral envelop over time. The Harmonic Adaptive Latent Component Analysis (HALCA) model adopted in this paper allows considering those two kinds of variations simultaneously. We evaluate the influence of three TFR functions including IIR, FIR filter bank semigram (FBSG) and constant-Q transform semigram in automatic music transcription task, on a database of popular and polyphonic classic music. The experiment results show that the filter bank based representations are suitable for multiple-instrument recordings and a CQT-based representation turns out to provide very accurate transcription for solo-instrument recordings.
What problem does this paper attempt to address?