Abstract:Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classification will be inefficient and even infeasible. In this work, we propose a method for few-shot class-incremental audio classification, which can continually recognize novel audio classes without forgetting old ones. The framework of our method mainly consists of two parts: an embedding extractor and a classifier, and their constructions are decoupled. The embedding extractor is the backbone of a ResNet based network, which is frozen after construction by a training strategy using only samples of base audio classes. However, the classifier consisting of prototypes is expanded by a prototype adaptation network with few samples of novel audio classes in incremental sessions. Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification. Three audio datasets, named NSynth-100, FSC-89 and LS-100 are built by choosing samples from audio corpora of NSynth, FSD-MIX-CLIP and LibriSpeech, respectively. Results show that our method exceeds baseline methods in average accuracy and performance dropping rate. In addition, it is competitive compared to baseline methods in computational complexity and memory requirement. The code for our method is given at <a class="link-external link-https" href="https://github.com/vinceasvp/FCAC" rel="external noopener nofollow">this https URL</a>.

Unsupervised and Semi-Supervised Few-Shot Acoustic Event Classification

Few-shot Acoustic Event Detection Via Meta-Learning

Self-supervised Learning for Acoustic Few-Shot Classification

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

Few-shot Bioacoustic Event Detection with Machine Learning Methods

Multi-Label Few-Shot Learning for Aspect Category Detection

Few-Shot Classification with Meta-Learning for Urban Infrastructure Monitoring Using Distributed Acoustic Sensing

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification

Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers

Semi-supervised Acoustic Event Detection based on tri-training

Unsupervised Contrastive Learning of Sound Event Representations

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

Multitask frame-level learning for few-shot sound event detection

Learning to Self-Train for Semi-Supervised Few-Shot Classification.

A Few-Shot Semi-Supervised Learning Method for Remote Sensing Image Scene Classification

Automatic Speaker Recognition with Limited Data.

Variational Hybrid-Attention Framework for Multi-Label Few-Shot Aspect Category Detection

An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning