Adaptive Learning via a Negative Selection Strategy for Few-Shot Bioacoustic Event Detection

Yaxiong Chen,Xueping Zhang,Yunfei Zi,Shengwu Xiong
2024-09-24
Abstract:Although the Prototypical Network (ProtoNet) has demonstrated effectiveness in few-shot biological event detection, two persistent issues remain. Firstly, there is difficulty in constructing a representative negative prototype due to the absence of explicitly annotated negative samples. Secondly, the durations of the target biological vocalisations vary across tasks, making it challenging for the model to consistently yield optimal results across all tasks. To address these issues, we propose a novel adaptive learning framework with an adaptive learning loss to guide classifier updates. Additionally, we propose a negative selection strategy to construct a more representative negative prototype for ProtoNet. All experiments ware performed on the DCASE 2023 TASK5 few-shot bioacoustic event detection dataset. The results show that our proposed method achieves an F-measure of 0.703, an improvement of 12.84%.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Difficulty in constructing representative negative sample prototypes**: - In Few - Shot Bioacoustic Event Detection (FSBED), due to the lack of explicitly labeled negative samples, it is difficult to construct a representative negative sample prototype. This may lead to poor performance of the model in distinguishing between positive and negative samples. 2. **Challenges in dealing with the variation in the duration of target biological sounds in different tasks**: - The duration of target biological sounds varies greatly in different tasks, making it difficult for the model to consistently obtain optimal results in all tasks. For example, sound segments in some tasks may be short, while those in other tasks may be long. To solve these problems, the author proposes a new adaptive learning framework and introduces the following innovations: - **Negative Selection Strategy**: By selecting more representative negative samples from the query set, the representativeness of the negative sample prototype is enhanced. - **Adaptive Learning Framework**: Combining the teacher model and the student model, the classifier parameters are dynamically adjusted to better handle tasks with different durations. - **Adaptive Learning Loss Function**: The degree of knowledge transfer is adjusted according to the duration of the task, thereby optimizing the model performance. These methods work together to enable the model to achieve significant improvement on the DCASE 2023 TASK5 dataset, with the F - measure reaching 0.703, a 12.84% improvement compared to the baseline method. ### Formula Summary - **Prototype Calculation Formula**: \[ w_k=\frac{1}{|S_k|}\sum_{x_i\in S_k}\frac{f_\varphi(x_i)}{\|f_\varphi(x_i)\|_2} \] where \( w_k \) is the prototype of the \( k \) - th class, \( f_\varphi(x_i) \) is the embedded feature output by the feature extractor, and \( \| \cdot \|_2 \) represents the L2 norm. - **Prediction Probability Distribution Formula**: \[ p_{ik}=\frac{\exp(d(w_k,z_i))}{\sum_{c = 1}^K\exp(d(w_c,z_i))} \] where \( d(x,y) \) represents the Euclidean distance. - **Negative Selection Strategy Formula**: \[ Z'_n=\left\{z_j\mid d(z_j,w_1)>d(w_1,w_2)\cap d(z_j,w_1)-d(z_j,w_2)>\frac{d(w_1,w_2)}{2},0 < j\leq|Q|\right\} \] - **Adaptive Learning Loss Function**: \[ \text{Loss}=\frac{\text{seglen}}{T}\times(D_{KL}-\lambda\times I(X_{st};Y_{st})) \] where \( D_{KL} \) is the Kullback - Leibler divergence, \( I(X_{st};Y_{st}) \) is the mutual information entropy, and \( \lambda \) and \( T \) are hyperparameters. Through these methods, the paper effectively solves the key problems in FSBED and significantly improves the performance of the model.