Abstract:Sheet music recognition is a vital technology aimed at converting printed or handwritten musical scores into digital or machine-readable formats. The significance of this technology lies in making music compositions more accessible for editing, performance, learning, and sharing, thereby fostering music education, composition, and culture. It also provides a powerful tool for music analysis, research, and preservation. Our aim is to investigate a sheet music recognition method that offers a simple workflow, high recognition accuracy, and fast model convergence. Specifically, the proposed Deep Multilevel Cascade Residual Recurrent (MCRR) framework for sheet music recognition consists of the following components. Firstly, we introduce additive Gaussian white noise, additive Perlin noise, and elastic deformations such as rotation and stretching to simulate real-world noise in the sheet music images, thereby augmenting the dataset, enhancing model robustness, and mitigating overfitting. Secondly, in the feature extraction phase, we employ a residual Convolutional Neural Network (ConvNet) to address the issue of model degradation and use the multilevel cascade fusion technique to obtain comprehensive feature information, improving the model’s feature extraction capability and reducing recognition errors. For note recognition, we use a variant of RNN (Recurrent Neural Network) called SRU (Simple Recurrent Unit), which transforms most computations into parallel processing, speeding up model convergence. Finally, we combine the Connectionist Temporal Classification (CTC) loss function with SRU to eliminate the requirement for strict alignment between data and labels, enabling note classification and recognition. Extensive ablation experiments and comparative analyses, including visual analysis, intuitive illustrations, and quantitative assessments, confirm the effectiveness of the proposed method, demonstrating its superiority over various state-of-the-art methods. The proposed method achieved promising results in both the PrIMus and Camera-PrIMuS datasets. Specifically, in the PrIMus dataset, the method obtained an SeER (Symbol Error Rate) of 1.4571% and a SyER (System Error Rate) of 0.3234%. Notably, it demonstrated high accuracy in pitch, type, and note recognition, scoring approximately 97% in pitch and type accuracy and around 94% in note accuracy. The training time per epoch was relatively low, recorded at 0.56 seconds. In the case of the Camera-PrIMuS dataset, the method achieved slightly lower but still competitive results. It exhibited an SeER of 5.1488% and a SyER of 1.0612%, with pitch and type accuracies around 90%, and note accuracy at approximately 88%. The training time per epoch was slightly higher at 1.93 seconds Furthermore, we compare our method with existing commercial software, namely Capella-scan, PhotoScore, and SmartScore. Among these, Capella-scan delivers the best performance but exhibits lower robustness compared to the proposed method.

Coordinate Embedding Transformer Model for Optical Music Recognition on Monophonic Scores

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription

Research of Numbered Musical Notation Recognition Method

Toward a More Complete OMR Solution

Score Transformer: Generating Musical Score from Note-level Representation

Improved Feature Pyramid Convolutional Neural Network for Effective Recognition of Music Scores

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

Real-Time Optical Music Recognition System for Dulcimer Musical Robot

A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model

End-to-end Piano Performance-MIDI to Score Conversion with Transformers

Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation

Practical End-to-End Optical Music Recognition for Pianoform Music

Understanding Optical Music Recognition

Advancing Handwritten Musical Notation Recognition Using Deep Learning: A Convolutional Neural Network-Based Approach with Improved Accuracy

Deep Multilevel Cascade Residual Recurrent Framework (MCRR) for Sheet Music Recognition

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music

A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification