Improved symbolic drum style classification with grammar-based hierarchical representations

Léo Géré,Philippe Rigaux,Nicolas Audebert
2024-07-24
Abstract:Deep learning models have become a critical tool for analysis and classification of musical data. These models operate either on the audio signal, e.g. waveform or spectrogram, or on a symbolic representation, such as MIDI. In the latter, musical information is often reduced to basic features, i.e. durations, pitches and velocities. Most existing works then rely on generic tokenization strategies from classical natural language processing, or matrix representations, e.g. piano roll. In this work, we evaluate how enriched representations of symbolic data can impact deep models, i.e. Transformers and RNN, for music style classification. In particular, we examine representations that explicitly incorporate musical information implicitly present in MIDI-like encodings, such as rhythmic organization, and show that they outperform generic tokenization strategies. We introduce a new tree-based representation of MIDI data built upon a context-free musical grammar. We show that this grammar representation accurately encodes high-level rhythmic information and outperforms existing encodings on the GrooveMIDI Dataset for drumming style classification, while being more compact and parameter-efficient.
Sound,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of how to effectively represent symbolic music data (such as MIDI files) in the field of Music Information Retrieval (MIR), especially in the drum performance style classification task, in order to improve the performance of deep - learning models. Specifically, the authors attempt to improve the existing symbolic drum - style classification methods by introducing a hierarchical tree - structure representation method based on context - free grammar. #### Main problems: 1. **Limitations of existing representations**: Most existing MIDI representations only contain low - level information (such as the start time, duration, and intensity of notes), while ignoring higher - level music information (such as rhythmic structure, harmony, etc.). This causes the model to need to extract these high - level features from the data, increasing the training difficulty and computational cost. 2. **Improving classification performance**: The author hopes that by introducing a richer music information representation method, the deep - learning model can directly utilize these high - level features, thereby improving the performance of the classification task and reducing the need for a large number of parameters. #### Solutions: - **Linearized Rhythmic Tree (LRT)**: A tree - structure built based on context - free grammar, which can explicitly encode the rhythmic information in MIDI data. This representation is not only more compact but also can better capture the high - level structure in music. - **Tree - based positional encoding**: In order to preserve the information of the tree - structure in the Transformer model, the author introduced a tree - based positional encoding method (TBPE), enabling the model to better understand the hierarchical relationships in music. #### Experimental results: - Through experiments on the GrooveMIDI dataset, the author proved that the LRT representation is significantly superior to the existing representation methods in the drum performance style classification task. In particular, the Transformer model combined with TBPE achieved the highest F1 score (about 0.66) on the test set, and the number of required parameters was 6 times less than that of the traditional LSTM model. In conclusion, this paper solves the problem of over - dependence on low - level information in existing methods by introducing a new MIDI representation method, improving the performance of the classification task and the efficiency of the model.