Abstract:In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcription while ensuring both high performance and lightweight. To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model. First, we extend the acoustic module by adding a frequency-conditioned FiLM layer to the CNN module to adapt the convolutional filters on the frequency axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM that focuses on note-state transitions within a note. In addition, we augment the autoregressive connection with an enhanced recursive context. Using these components, we propose two types of models; one for high performance and the other for high compactness. Through extensive experiments, we show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset. We also investigate the effective model size and real-time inference latency by gradually streamlining the architecture. Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth analysis to elucidate the effect of the proposed components in the view of note length and pitch range.

Onsets and Frames: Dual-Objective Piano Transcription

Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks

Context-Independent Polyphonic Piano Onset Transcription with an Infinite Training Dataset

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

On the Potential of Simple Framewise Approaches to Piano Transcription

Improving Automatic Piano Transcription by Refined Feature Fusion and Weighted Loss

High-resolution Piano Transcription with Pedals by Regressing Onset and Offset Times

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

DAFE-MSGAT: Dual-Attention Feature Extraction and Multi-Scale Graph Attention Network for Polyphonic Piano Transcription

A Data-Driven Analysis of Robust Automatic Piano Transcription

Note Value Recognition for Piano Transcription Using Markov Random Fields

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals

A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription

Robust Multipitch Estimation Of Piano Sounds Using Deep Spiking Neural Networks

Invariances and Data Augmentation for Supervised Music Transcription

A holistic approach to polyphonic music transcription with neural networks

Piano automatic transcription based on transformer

Semi-Supervised Convolutive NMF for Automatic Piano Transcription

Reconstructing Human Expressiveness in Piano Performances with a Transformer Network