Abstract:The Prototypical Network (ProtoNet) has emerged as a popular choice in Few-shot Learning (FSL) scenarios due to its remarkable performance and straightforward implementation. Building upon such success, we first propose a simple (yet novel) method to fine-tune a ProtoNet on the (labeled) support set of the test episode of a C-way-K-shot test episode (without using the query set which is only used for evaluation). We then propose an algorithmic framework that combines ProtoNet with optimization-based FSL algorithms (MAML and Meta-Curvature) to work with such a fine-tuning method. Since optimization-based algorithms endow the target learner model with the ability to fast adaption to only a few samples, we utilize ProtoNet as the target model to enhance its fine-tuning performance with the help of a specifically designed episodic fine-tuning strategy. The experimental results confirm that our proposed models, MAML-Proto and MC-Proto, combined with our unique fine-tuning method, outperform regular ProtoNet by a large margin in few-shot audio classification tasks on the ESC-50 and Speech Commands v2 datasets. We note that although we have only applied our model to the audio domain, it is a general method and can be easily extended to other domains.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the performance of audio classification tasks by optimizing the fine - tuning method based on Prototypical Network (ProtoNet) in the Few - shot Learning (FSL) scenario. Specifically, the author proposes a novel method, which is to fine - tune ProtoNet using the labeled support set during the test phase and combine it with the optimized basic FSL algorithms (such as MAML and Meta - Curvature) to enhance the model's rapid adaptation ability. ### Main problems 1. **Limitations of existing methods**: The performance of existing FSL methods in audio classification tasks is limited, especially when only using the support set for comparison without fully exploiting its potential. 2. **Effectiveness of fine - tuning**: Directly fine - tuning ProtoNet may not be effective or may even be harmful, especially when the adjustment magnitude is large, which is prone to over - fitting. 3. **Cross - domain applicability**: Although research has mainly focused on the image field, the application of FSL methods in the audio field is still challenging and requires verification of its effectiveness and generalization ability. ### Solutions The author proposes a method named Rotational Division Fine - Tuning (RDFT), which divides the support set into sub - support sets and pseudo - query sets for fine - tuning ProtoNet. In addition, the author also proposes two new algorithms - MAML - Proto and MC - Proto. These algorithms combine the optimized basic FSL frameworks (MAML and Meta - Curvature) and further improve the model's adaptation performance through a specially designed "episode fine - tuning" strategy. ### Experimental verification The author conducted experiments on two public datasets (ESC - 50 and Speech Commands v2), and the results show that: - Directly applying RDFT to the ordinary ProtoNet will lead to a performance decline. - Applying RDFT in the proposed MAML - Proto and MC - Proto frameworks can significantly improve the model performance, especially on the ESC - 50 dataset. - For the Speech Commands v2 dataset, although the performance improvement is not as obvious as that of ESC - 50, it still shows an improved effect. ### Conclusions This paper successfully improves the performance of ProtoNet in few - shot audio classification tasks by introducing the RDFT method and combining the optimized basic FSL frameworks. This method is not only applicable to the audio field but can also be extended to other fields, providing new ideas and directions for future research. ### Formula summary - **Prototype calculation**: \[ c_k=\frac{1}{|S_k|} \sum_{(x_i, y_i) \in S_k} f_\theta(x_i) \] - **Query sample classification probability**: \[ p_\theta(y = k|x)=\frac{\exp(-d(f_\theta(x), c_k))}{\sum_{k'} \exp(-d(f_\theta(x), c_{k'}))} \] - **MAML parameter update**: \[ \theta'=\theta-\alpha \nabla_\theta L_T(f_\theta) \] \[ \theta \leftarrow \theta-\beta \nabla_\theta \sum_{T_i \sim p(T)} L_T(f_{\theta'}) \] - **Meta - Curvature transformation**: \[ MC(G)=G\times_3 M_f\times_2 M_i\times_1 M_o \] These formulas describe in detail the working principle and optimization process of the model, ensuring the reproducibility and scientific nature of the paper's method.

Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification

MetaNODE: Prototype Optimization as a Neural ODE for Few-Shot Learning

Prototype Optimization with Neural ODE for Few-Shot Learning

Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

Neural Fine-Tuning Search for Few-Shot Learning

Reweighting and Information-Guidance Networks for Few-Shot Learning

Prototype Relationship Optimization Network for Few‐Shot Learning

Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning

Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification.

Prototype Completion for Few-Shot Learning

Adaptive Fine-Tuning Strategy for Few-Shot Learning

Meta-Learning for Semi-Supervised Few-Shot Classification

Contrastive prototype network with prototype augmentation for few-shot classification

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Learning Class-level Prototypes for Few-shot Learning

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

Pre-Finetuning for Few-Shot Emotional Speech Recognition

ProtoRefine: Enhancing Prototypes with Similar Structure in Few-Shot Learning

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Hybrid Attention-Based Prototypical Networks for Few-Shot Sound Classification

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning