Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification

Xuanyu Zhuang,Geoffroy Peeters,Gaël Richard
2024-10-04
Abstract:The Prototypical Network (ProtoNet) has emerged as a popular choice in Few-shot Learning (FSL) scenarios due to its remarkable performance and straightforward implementation. Building upon such success, we first propose a simple (yet novel) method to fine-tune a ProtoNet on the (labeled) support set of the test episode of a C-way-K-shot test episode (without using the query set which is only used for evaluation). We then propose an algorithmic framework that combines ProtoNet with optimization-based FSL algorithms (MAML and Meta-Curvature) to work with such a fine-tuning method. Since optimization-based algorithms endow the target learner model with the ability to fast adaption to only a few samples, we utilize ProtoNet as the target model to enhance its fine-tuning performance with the help of a specifically designed episodic fine-tuning strategy. The experimental results confirm that our proposed models, MAML-Proto and MC-Proto, combined with our unique fine-tuning method, outperform regular ProtoNet by a large margin in few-shot audio classification tasks on the ESC-50 and Speech Commands v2 datasets. We note that although we have only applied our model to the audio domain, it is a general method and can be easily extended to other domains.
Audio and Speech Processing,Machine Learning,Multimedia,Sound,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the performance of audio classification tasks by optimizing the fine - tuning method based on Prototypical Network (ProtoNet) in the Few - shot Learning (FSL) scenario. Specifically, the author proposes a novel method, which is to fine - tune ProtoNet using the labeled support set during the test phase and combine it with the optimized basic FSL algorithms (such as MAML and Meta - Curvature) to enhance the model's rapid adaptation ability. ### Main problems 1. **Limitations of existing methods**: The performance of existing FSL methods in audio classification tasks is limited, especially when only using the support set for comparison without fully exploiting its potential. 2. **Effectiveness of fine - tuning**: Directly fine - tuning ProtoNet may not be effective or may even be harmful, especially when the adjustment magnitude is large, which is prone to over - fitting. 3. **Cross - domain applicability**: Although research has mainly focused on the image field, the application of FSL methods in the audio field is still challenging and requires verification of its effectiveness and generalization ability. ### Solutions The author proposes a method named Rotational Division Fine - Tuning (RDFT), which divides the support set into sub - support sets and pseudo - query sets for fine - tuning ProtoNet. In addition, the author also proposes two new algorithms - MAML - Proto and MC - Proto. These algorithms combine the optimized basic FSL frameworks (MAML and Meta - Curvature) and further improve the model's adaptation performance through a specially designed "episode fine - tuning" strategy. ### Experimental verification The author conducted experiments on two public datasets (ESC - 50 and Speech Commands v2), and the results show that: - Directly applying RDFT to the ordinary ProtoNet will lead to a performance decline. - Applying RDFT in the proposed MAML - Proto and MC - Proto frameworks can significantly improve the model performance, especially on the ESC - 50 dataset. - For the Speech Commands v2 dataset, although the performance improvement is not as obvious as that of ESC - 50, it still shows an improved effect. ### Conclusions This paper successfully improves the performance of ProtoNet in few - shot audio classification tasks by introducing the RDFT method and combining the optimized basic FSL frameworks. This method is not only applicable to the audio field but can also be extended to other fields, providing new ideas and directions for future research. ### Formula summary - **Prototype calculation**: \[ c_k=\frac{1}{|S_k|} \sum_{(x_i, y_i) \in S_k} f_\theta(x_i) \] - **Query sample classification probability**: \[ p_\theta(y = k|x)=\frac{\exp(-d(f_\theta(x), c_k))}{\sum_{k'} \exp(-d(f_\theta(x), c_{k'}))} \] - **MAML parameter update**: \[ \theta'=\theta-\alpha \nabla_\theta L_T(f_\theta) \] \[ \theta \leftarrow \theta-\beta \nabla_\theta \sum_{T_i \sim p(T)} L_T(f_{\theta'}) \] - **Meta - Curvature transformation**: \[ MC(G)=G\times_3 M_f\times_2 M_i\times_1 M_o \] These formulas describe in detail the working principle and optimization process of the model, ensuring the reproducibility and scientific nature of the paper's method.