Abstract:pyAMPACT (Python-based Automatic Music Performance Analysis and Comparison Toolkit) links symbolic and audio music representations to facilitate score-informed estimation of performance data in audio as well as general linking of symbolic and audio music representations with a variety of annotations. pyAMPACT can read a range of symbolic formats and can output note-linked audio descriptors/performance data into MEI-formatted files. The audio analysis uses score alignment to calculate time-frequency regions of importance for each note in the symbolic representation from which to estimate a range of parameters. These include tuning-, dynamics-, and timbre-related performance descriptors, with timing-related information available from the score alignment. Beyond performance data estimation, pyAMPACT also facilitates multi-modal investigations through its infrastructure for linking symbolic representations and annotations to audio.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the alignment between symbols (music scores) and audio representations in the estimation of music performance data and multimodal processing. Specifically, the pyAMPACT toolkit aims to: 1. **Align symbolic music representations with audio representations**: By linking symbolic music (such as music scores) and audio representations, audio analysis based on music scores can be achieved. This enables the extraction of performance data corresponding to music scores from the audio. 2. **Estimate performance data**: Estimate a series of performance - related parameters from the audio, including pitch, dynamics, and timbre, etc. These parameters can help researchers better understand the nuances in music performance. 3. **Support multimodal processing**: By combining symbolic representations, audio, and other annotation information, support more complex multimodal analysis tasks. For example, information from different sources such as music scores, audio, and motion - capture data of performers can be integrated for research. 4. **Expand functionality and compatibility**: Compared with previous tools (such as AMPACT), pyAMPACT supports more types of symbolic formats (such as Humdrum, MEI, MIDI, MusicXML, etc.), and can read multiple annotation coding formats (such as Dezrann, the analysis spine of Humdrum, etc.). In addition, it also provides more powerful visualization and data export functions. ### Solutions to specific problems - **Alignment of symbols and audio**: Use the Dynamic Time Warping (DTW) algorithm to align symbolic and audio representations, thereby accurately estimating the time - frequency regions of each note. - **Estimation of performance parameters**: - **Pitch - related descriptors**: Include the average fundamental frequency \( f_0 \), perceived pitch, vibrato rate, and vibrato depth, etc. - The average fundamental frequency \( f_0 \) is calculated using the geometric mean: \[ \text{mean } f_0=\left(\prod_{i = 1}^{N} f_{0,i}\right)^{\frac{1}{N}} \] - The perceived pitch is calculated based on the weighted average, and the weights depend on the frequency change rate: \[ \text{perceived pitch}=\sum_{i = 1}^{N} w_i\cdot f_{0,i} \] where the weights \( w_i \) depend on the frequency change rate and include a \( \gamma \) value that controls the dynamic range of the weights. - The vibrato depth \( E \) and vibrato rate \( R \) are calculated by the following formulas respectively: \[ E = 2\cdot\max_k|X(k)| \] \[ R = f_n\cdot\arg\max_k|X(k)| \] - **Dynamics - related descriptors**: Include the average power and shimmer, where the average power is calculated using the arithmetic mean: \[ \text{mean power}=\frac{1}{N}\sum_{i = 1}^{N} P_i \] - **Timbre - related descriptors**: Estimate various spectral features (such as bandwidth, centroid, contrast, flatness, and attenuation point) from the harmonic spectral representation, and take the arithmetic mean as the summary descriptor. - **Multimodal processing**: By supporting the import of multiple annotation formats and providing flexible data export functions, pyAMPACT provides the infrastructure for multimodal processing. In general, pyAMPACT aims to provide a more powerful and flexible tool for music information retrieval and musicology research by improving the alignment of symbols and audio and the estimation of performance parameters.

pyAMPACT: A Score-Audio Alignment Toolkit for Performance Data Estimation and Multi-modal Processing

Encoding Performance Data in MEI with the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT)

Unaligned Supervision For Automatic Music Transcription in The Wild

AIMusicGuru: Music Assisted Human Pose Correction

CRAFT: A multifunction online platform for speech prosody visualisation

Musical Score Following and Audio Alignment

Just Label the Repeats for In-The-Wild Audio-to-Score Alignment

End-to-end Piano Performance-MIDI to Score Conversion with Transformers

A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization

A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral Music

From Music Scores to Audio Recordings: Deep Pitch-Class Representations for Measuring Tonal Structures

AccoMontage2: A Complete Harmonization and Accompaniment Arrangement System

Development of the adaptive music perception test

A multimodal deep learning algorithm for polyphonic music applied to music sentiment analysis and generation

All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio

AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics

Partitura: A Python Package for Symbolic Music Processing

Biosignal Analysis with Matching-Pursuit Based Adaptive Chirplet Transform

Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials

Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control

PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing