GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model

Xavier Riley,Zixun Guo,Drew Edwards,Simon Dixon
2024-08-30
Abstract:We introduce GAPS (Guitar-Aligned Performance Scores), a new dataset of classical guitar performances, and a benchmark guitar transcription model that achieves state-of-the-art performance on GuitarSet in both supervised and zero-shot settings. GAPS is the largest dataset of real guitar audio, containing 14 hours of freely available audio-score aligned pairs, recorded in diverse conditions by over 200 performers, together with high-resolution note-level MIDI alignments and performance videos. These enable us to train a state-of-the-art model for automatic transcription of solo guitar recordings which can generalise well to real world audio that is unseen during training.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the lack of high - quality datasets for automatic music transcription (AMT) of classical guitars in the field of music information retrieval (MIR). Specifically: 1. **Lack of high - quality guitar datasets**: Compared with the piano, the high - quality datasets related to guitars are very limited. This restricts the accuracy and development of guitar AMT systems. For example, the piano has comprehensive datasets such as MAESTRO and MAPS, while guitars do not have similar resources. 2. **Improving the performance of automatic music transcription**: In order to improve the performance of guitar AMT systems, researchers need a dataset that contains a large amount of real - audio, matching music scores, and high - quality annotations. Existing guitar datasets such as GuitarSet and EGDB are small in scale and lack diversity, and cannot fully train and validate high - performance AMT models. To solve these problems, the author introduced GAPS (Guitar - Aligned Performance Scores), which is a new, large - scale classical guitar dataset, containing 14 hours of real guitar audio, as well as matching music scores, high - resolution MIDI annotations, and performance videos. These data were recorded by more than 200 performers under different recording conditions, providing rich diversity and a wide range of application scenarios. In addition, the author also used the GAPS dataset to train a benchmark guitar transcription model, which achieved state - of - the - art performance in both supervised learning and zero - sample settings. Through this dataset and model, researchers hope to promote the development of guitar AMT systems and facilitate research on other MIR tasks, such as automatic music transcription, score following, performance analysis, generative music modeling, and expressive performance time research. In summary, this paper aims to fill the gaps in existing research by providing a large - scale, diverse guitar dataset and significantly improve the performance of guitar AMT systems.