Abstract:We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy and universality of music beat and downbeat tracking, especially when dealing with music with complex rhythm changes and uncommon time signatures. Specifically, the paper mainly focuses on the following aspects: 1. **Removing DBN post - processing**: - The paper proposes a new system aiming to improve the universality and accuracy of the system by removing the commonly - used Dynamic Bayesian Network (DBN) post - processing step. DBN is usually used to limit the range of rhythm and tempo changes, but when dealing with music segments with time - signature changes, tempos outside a specific range, or the number of beats per bar not in the list of supported values, DBN may fail. 2. **Improving the generalization ability of the model**: - In order to improve the generalization ability of the model, the researchers used multiple datasets for training, including solo - instrument recordings, works containing time - signature changes, and classical music with high - rhythm changes, etc. This enables the model to better adapt to different types of music. 3. **Improving the loss function and model architecture**: - To improve accuracy, the researchers developed a loss function that is tolerant to small - time - offset of annotations and designed an architecture that alternately uses convolution and transformers to process input data in the frequency or time dimension. This architecture helps to capture complex features in the music signal. 4. **Dealing with poor performance in continuity evaluation metrics**: - Although the proposed system outperforms existing methods in the F1 score, it performs poorly in continuity evaluation metrics (such as CMLt and AMLt). The researchers explored possible reasons, including that the loss function does not specifically penalize non - periodic predictions and that there are some non - periodic annotations in the dataset. 5. **Open - source code and pre - processed datasets**: - To promote further research, the authors released their model, code, and pre - processed datasets, inviting other researchers to try to surpass this achievement. In summary, the main goal of this paper is to develop an efficient beat and downbeat tracking system without relying on DBN post - processing, thereby improving its performance in diverse music types and providing a strong foundation for subsequent research.

Beat this! Accurate beat tracking without DBN postprocessing

Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music

LC-Beating: An Online System for Beat and Downbeat Tracking using Latency-Controlled Mechanism

BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer

Deep Learning-Based Automatic Downbeat Tracking: A Brief Review

Musical Score Following and Audio Alignment

Adapting Meter Tracking Models to Latin American Music

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System

Towards Reliable Real-time Opera Tracking: Combining Alignment with Audio Event Detectors to Increase Robustness

Basic Evaluation of Auditory Temporal Stability (beats): A Novel Rationale and Implementation

A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

Just Label the Repeats for In-The-Wild Audio-to-Score Alignment

All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio

Improving Real-time Score Following in Opera by Combining Music with Lyrics Tracking

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval

Everybody Compose: Deep Beats To Music

HeartBEAT: Heart Beat Estimation through Adaptive Tracking

Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips