Abstract:The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily addresses the problem of continual learning in audio deepfake detection. Specifically: 1. **Existing Challenges**: - Current audio deepfake detection models perform well in identifying known types of deepfake audio but show significant performance degradation when faced with new types of attacks. - There is a need for a method to improve the adaptability and robustness of models to handle new types of deepfakes. 2. **Proposed Method**: - The paper proposes a continual learning method called "Radian Weight Modification (RWM)" for audio deepfake detection. - The core concept of RWM is to categorize all classes into two types: classes with compact feature distributions (e.g., real audio) and classes with dispersed feature distributions (e.g., various fake audio). By doing so, RWM introduces a trainable gradient modification direction to distinguish different data types. 3. **Experimental Validation**: - Experimental results show that in the task of audio deepfake detection, the RWM method outperforms existing mainstream continual learning methods in terms of knowledge acquisition and mitigating forgetting. - Additionally, the RWM method has broad applicability and can be applied to other machine learning fields such as image recognition. In summary, this paper aims to address the performance degradation issue of existing audio deepfake detection models when facing new types of attacks by proposing the RWM method and demonstrating its superior performance in various application scenarios.

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection