MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

Dichucheng Li,Yinghao Ma,Weixing Wei,Qiuqiang Kong,Yulun Wu,Mingjin Che,Fan Xia,Emmanouil Benetos,Wei Li
2023-10-15
Abstract:Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresses data scarcity and class imbalance challenges. Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Additionally, we apply a post-processing approach for event-level prediction, where an IPT activation initiates an event only if the onset output confirms an onset in that frame. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Further experiments demonstrate the efficacy of multi-task finetuning on each IPT class.
Sound,Artificial Intelligence,Machine Learning,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
The paper aims to address the issues of data scarcity and class imbalance in Instrument Playing Techniques (IPTs) detection. Specifically, the paper presents the following contributions: 1. **Utilization of Self-Supervised Learning Models**: To tackle the problems of data scarcity and class imbalance, the authors propose using self-supervised learning (SSL) models pre-trained on large-scale unlabeled music data and fine-tuning them for the IPT detection task. This approach effectively mitigates the challenges posed by insufficient data and class imbalance. 2. **Multi-Task Fine-Tuning**: Considering the importance of pitch in capturing subtle differences in IPTs and the role of onset in locating IPT events, the paper further explores a multi-task fine-tuning method that incorporates pitch detection and IPT onset detection as auxiliary tasks. Experimental results demonstrate the effectiveness of this multi-task learning on two IPT datasets with annotated pitch. 3. **Post-Processing Strategy**: A post-processing method is introduced, where IPT activation only triggers an event if the onset output confirms the presence of an onset at that frame. This method significantly improves event-level evaluation metrics. Through the above methods, the paper achieves superior performance compared to existing methods on multiple IPT benchmark datasets and validates its effectiveness and generalization capability on datasets of different instruments (Guzheng, Guitar, and Chinese Bamboo Flute). Additionally, the paper demonstrates the impact of multi-task learning and transfer learning on each specific IPT category.