Abstract:Deep learning models have become a critical tool for analysis and classification of musical data. These models operate either on the audio signal, e.g. waveform or spectrogram, or on a symbolic representation, such as MIDI. In the latter, musical information is often reduced to basic features, i.e. durations, pitches and velocities. Most existing works then rely on generic tokenization strategies from classical natural language processing, or matrix representations, e.g. piano roll. In this work, we evaluate how enriched representations of symbolic data can impact deep models, i.e. Transformers and RNN, for music style classification. In particular, we examine representations that explicitly incorporate musical information implicitly present in MIDI-like encodings, such as rhythmic organization, and show that they outperform generic tokenization strategies. We introduce a new tree-based representation of MIDI data built upon a context-free musical grammar. We show that this grammar representation accurately encodes high-level rhythmic information and outperforms existing encodings on the GrooveMIDI Dataset for drumming style classification, while being more compact and parameter-efficient.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of how to effectively represent symbolic music data (such as MIDI files) in the field of Music Information Retrieval (MIR), especially in the drum performance style classification task, in order to improve the performance of deep - learning models. Specifically, the authors attempt to improve the existing symbolic drum - style classification methods by introducing a hierarchical tree - structure representation method based on context - free grammar. #### Main problems: 1. **Limitations of existing representations**: Most existing MIDI representations only contain low - level information (such as the start time, duration, and intensity of notes), while ignoring higher - level music information (such as rhythmic structure, harmony, etc.). This causes the model to need to extract these high - level features from the data, increasing the training difficulty and computational cost. 2. **Improving classification performance**: The author hopes that by introducing a richer music information representation method, the deep - learning model can directly utilize these high - level features, thereby improving the performance of the classification task and reducing the need for a large number of parameters. #### Solutions: - **Linearized Rhythmic Tree (LRT)**: A tree - structure built based on context - free grammar, which can explicitly encode the rhythmic information in MIDI data. This representation is not only more compact but also can better capture the high - level structure in music. - **Tree - based positional encoding**: In order to preserve the information of the tree - structure in the Transformer model, the author introduced a tree - based positional encoding method (TBPE), enabling the model to better understand the hierarchical relationships in music. #### Experimental results: - Through experiments on the GrooveMIDI dataset, the author proved that the LRT representation is significantly superior to the existing representation methods in the drum performance style classification task. In particular, the Transformer model combined with TBPE achieved the highest F1 score (about 0.66) on the test set, and the number of required parameters was 6 times less than that of the traditional LSTM model. In conclusion, this paper solves the problem of over - dependence on low - level information in existing methods by introducing a new MIDI representation method, improving the performance of the classification task and the efficiency of the model.

Improved symbolic drum style classification with grammar-based hierarchical representations

Impact of time and note duration tokenizations on deep learning symbolic music modeling

Modelling Symbolic Music: Beyond the Piano Roll

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music.

Improving Musical Instrument Classification with Advanced Machine Learning Techniques

Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a Style

Byte Pair Encoding for Symbolic Music

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Conditional Drums Generation using Compound Word Representations

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

End-to-end Piano Performance-MIDI to Score Conversion with Transformers

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision

Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers

A transformers-based approach for fine and coarse-grained classification and generation of MIDI songs and soundtracks

Supervised Symbolic Music Style Translation Using Synthetic Data

2019 Formatting Instructions for Authors Using LaTeX

MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer