Dynamic Ensemble Teacher-Student Distillation Framework for Light-weight Fake Audio Detection

Jun Xue,Cunhang Fan,Jiangyan Yi,Jian Zhou,Zhao Lv
DOI: https://doi.org/10.1109/lsp.2024.3431936
2024-01-01
IEEE Signal Processing Letters
Abstract:In recent years, fake audio detection (FAD) has made great progress, and lightweight is important to achieve fast and reliable audio authenticity verification on resource-limited devices. However, most of the researchers ignore lightweight when improving the performance of FAD. To develop the application of FAD for small-end devices, this paper proposes a novel lightweightnetworknamedLight-ECA2Net.Giventhatnetworkswith different depths have different abilities in capturing fake speech artifacts, this paper proposes a dynamic ensemble teacher-student distillation framework to fully transfer distillation knowledge. The dynamic ensemble distillation is divided into two aspects. First, we adopt one-to-one feature mapping to perceive the multidimensional feature knowledge and dynamically adjust every dimension feature weight by using ground truth labels, which can enable students to receive feature knowledge efficiently. Secondly, different network layers also have their strengths of predicting, further dynamically predicting weight can improve the learning ability of the student. Experimental results on the ASVspoof 2019 LA and PA datasets show that compared to the baseline, our system further improves performance by reducing the model complexity by 45%.
What problem does this paper attempt to address?