Abstract:In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcription while ensuring both high performance and lightweight. To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model. First, we extend the acoustic module by adding a frequency-conditioned FiLM layer to the CNN module to adapt the convolutional filters on the frequency axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM that focuses on note-state transitions within a note. In addition, we augment the autoregressive connection with an enhanced recursive context. Using these components, we propose two types of models; one for high performance and the other for high compactness. Through extensive experiments, we show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset. We also investigate the effective model size and real-time inference latency by gradually streamlining the architecture. Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth analysis to elucidate the effect of the proposed components in the view of note length and pitch range.

Improving Automatic Piano Transcription by Refined Feature Fusion and Weighted Loss

A Data-Driven Analysis of Robust Automatic Piano Transcription

Piano automatic transcription based on transformer

Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals

High-resolution Piano Transcription with Pedals by Regressing Onset and Offset Times

Onsets and Frames: Dual-Objective Piano Transcription

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

DAFE-MSGAT: Dual-Attention Feature Extraction and Multi-Scale Graph Attention Network for Polyphonic Piano Transcription

Towards Musically Informed Evaluation of Piano Transcription Models

AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Reconstructing Human Expressiveness in Piano Performances with a Transformer Network

MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription

On the Potential of Simple Framewise Approaches to Piano Transcription

Automatic Assessment of Piano Performances Using Timbre and Pitch Features

A study of audio mixing methods for piano transcription in violin-piano ensembles

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks

Multi-Instrument Polyphonic Melody Transcription Based on Deep Learning