Beat this! Accurate beat tracking without DBN postprocessing

Francesco Foscarin,Jan Schlüter,Gerhard Widmer
2024-07-31
Abstract:We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and universality of music beat and downbeat tracking, especially when dealing with music with complex rhythm changes and uncommon time signatures. Specifically, the paper mainly focuses on the following aspects: 1. **Removing DBN post - processing**: - The paper proposes a new system aiming to improve the universality and accuracy of the system by removing the commonly - used Dynamic Bayesian Network (DBN) post - processing step. DBN is usually used to limit the range of rhythm and tempo changes, but when dealing with music segments with time - signature changes, tempos outside a specific range, or the number of beats per bar not in the list of supported values, DBN may fail. 2. **Improving the generalization ability of the model**: - In order to improve the generalization ability of the model, the researchers used multiple datasets for training, including solo - instrument recordings, works containing time - signature changes, and classical music with high - rhythm changes, etc. This enables the model to better adapt to different types of music. 3. **Improving the loss function and model architecture**: - To improve accuracy, the researchers developed a loss function that is tolerant to small - time - offset of annotations and designed an architecture that alternately uses convolution and transformers to process input data in the frequency or time dimension. This architecture helps to capture complex features in the music signal. 4. **Dealing with poor performance in continuity evaluation metrics**: - Although the proposed system outperforms existing methods in the F1 score, it performs poorly in continuity evaluation metrics (such as CMLt and AMLt). The researchers explored possible reasons, including that the loss function does not specifically penalize non - periodic predictions and that there are some non - periodic annotations in the dataset. 5. **Open - source code and pre - processed datasets**: - To promote further research, the authors released their model, code, and pre - processed datasets, inviting other researchers to try to surpass this achievement. In summary, the main goal of this paper is to develop an efficient beat and downbeat tracking system without relying on DBN post - processing, thereby improving its performance in diverse music types and providing a strong foundation for subsequent research.