Abstract:This paper investigates the feasibility of class-incremental learning (CIL) for Sound Event Localization and Detection (SELD) tasks. The method features an incremental learner that can learn new sound classes independently while preserving knowledge of old classes. The continual learning is achieved through a mean square error-based distillation loss to minimize output discrepancies between subsequent learners. The experiments are conducted on the TAU-NIGENS Spatial Sound Events 2021 dataset, which includes 12 different sound classes and demonstrate the efficacy of proposed method. We begin by learning 8 classes and introduce the 4 new classes at next stage. After the incremental phase, the system is evaluated on the full set of learned classes. Results show that, for this realistic dataset, our proposed method successfully maintains baseline performance across all metrics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to implement Class - Incremental Learning (CIL) in Sound Event Localization and Detection (SELD) tasks, so that the model can independently learn new sound classes without retraining all previous data and maintain the ability to recognize old classes. ### Specific description of the problem 1. **Limitations of existing methods**: - Current SELD models are usually trained on a fixed set of sound classes. This means that once the model is trained, if new sound classes need to be added, the entire model must be retrained or fine - tuned. However, fine - tuning may lead to catastrophic forgetting, that is, the model forgets the knowledge of old classes when learning new classes. 2. **Requirements in practical application scenarios**: - In practical applications, such as surveillance, robots, and smart home devices, the system needs to be flexible and be able to dynamically add new sound classes without retraining the entire model. This not only improves the adaptability of the system but also reduces the computational cost. ### Solutions proposed in the paper The paper proposes a method based on Class - Incremental Learning (CIL - SELD), which solves the above problems in the following ways: - **Phased learning**: First, train a base model so that it can recognize 8 initial sound classes. Then, introduce 4 new sound classes in the incremental phase without retraining the entire model. - **Output Distillation Loss**: To prevent catastrophic forgetting, use the Mean Squared Error (MSE) as the distillation loss function. This loss function ensures that when new classes are introduced, the model's predicted output for old classes remains consistent with the output in the previous stage. The specific formula is as follows: \[ L=(1 - \lambda) L_{\text{MSE}}+\lambda L_{\text{OD}} \] where: - \( L_{\text{MSE}} \) is the Mean Squared Error loss for 4 new classes. - \( L_{\text{OD}} \) is the distillation loss, which is used to minimize the output difference between the old model (Stage 0) and the updated model (Stage 1) on the original 8 classes. - \( \lambda \) is a balancing parameter that controls the trade - off between learning new knowledge and retaining old knowledge. In this way, the CIL - SELD method can effectively maintain the ability to recognize old classes while continuously introducing new classes, thus achieving a more flexible and efficient SELD system.

Class-Incremental Learning for Sound Event Localization and Detection

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

Incremental Learning Algorithm for Sound Event Detection

Class-Incremental Learning for Multi-Label Audio Classification

Incremental Learning of Acoustic Scenes and Sound Events

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

CoLoC: Conditioned Localizer and Classifier for Sound Event Localization and Detection

Few-shot Class-incremental Audio Classification Using Stochastic Classifier

Class-Incremental Learning for SAR Muti-Class Target Detection

Active Learning for Sound Event Detection

Audio-Visual Class-Incremental Learning

Self-Supervised Incremental Learning for Sound Source Localization in Complex Indoor Environment

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Incremental Learning Based on Probabilistic SVM and SVDD and Its Application to Acoustic Signal Recognition

Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios

PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection

Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection

Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

A Class-Incremental Detection Method of Remote Sensing Images Based on Selective Distillation

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection