Abstract:Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classification will be inefficient and even infeasible. In this work, we propose a method for few-shot class-incremental audio classification, which can continually recognize novel audio classes without forgetting old ones. The framework of our method mainly consists of two parts: an embedding extractor and a classifier, and their constructions are decoupled. The embedding extractor is the backbone of a ResNet based network, which is frozen after construction by a training strategy using only samples of base audio classes. However, the classifier consisting of prototypes is expanded by a prototype adaptation network with few samples of novel audio classes in incremental sessions. Labeled support samples and unlabeled query samples are used to train the prototype adaptation network and update the classifier, since they are informative for audio classification. Three audio datasets, named NSynth-100, FSC-89 and LS-100 are built by choosing samples from audio corpora of NSynth, FSD-MIX-CLIP and LibriSpeech, respectively. Results show that our method exceeds baseline methods in average accuracy and performance dropping rate. In addition, it is competitive compared to baseline methods in computational complexity and memory requirement. The code for our method is given at <a class="link-external link-https" href="https://github.com/vinceasvp/FCAC" rel="external noopener nofollow">this https URL</a>.

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Few-Shot Speaker Identification Using Lightweight Prototypical Network With Feature Grouping and Interaction

Self-attention Based Speaker Recognition Using Cluster-Range Loss

Few Shot Speaker Recognition using Deep Neural Networks

CACRN-Net: A 3D log Mel spectrogram based channel attention convolutional recurrent neural network for few-shot speaker identification

Self-Attention Networks for Text-Independent Speaker Verification

Few-shot short utterance speaker verification using meta-learning

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Few-shot Underwater Acoustic Target Recognition Based on Siamese Network

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

Speakerfilter: deep learning-based target speaker extraction using anchor speech

Automatic Speaker Recognition with Limited Data.

3D Convolutional Neural Networks Based Speaker Identification and Authentication.

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Speaker Recognition Based on Pre-Trained Model and Deep Clustering

Deep Speaker Feature Learning for Text-independent Speaker Verification

Channel-Spatial-Based Few-Shot Bird Sound Event Detection