An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

Manasi Chhibber,Jagabandhu Mishra,Hyejin Shim,Tomi H. Kinnunen
2024-09-17
Abstract:We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the interpretability problems in spoofed speech detection and attack source - tracing. Specifically, the authors propose a new method to represent spoofed speech through interpretable probabilistic attribute embedding. Traditional methods usually rely on high - dimensional raw embeddings, which are difficult to interpret, while the method in this paper introduces probabilistic attributes to evaluate the presence or absence of sub - components that constitute a specific spoofing attack. #### Main problems: 1. **Interpretability of spoofed speech detection**: Although existing counter - measures (CMs) have good performance, they are usually black - box models and lack the ability to explain the decision - making process. This is especially important in critical fields such as forensics. 2. **Accuracy of spoofing attack source - tracing**: Besides distinguishing between real and spoofed speech, it is also necessary to identify specific spoofing methods and their components to improve the accuracy of attack source - tracing. #### Solutions: - **Probabilistic attribute embedding**: A series of probabilistic attributes are designed to measure the presence or absence of each module (such as acoustic feature prediction, waveform generation, speaker modeling, etc.) in the spoofed speech generation process. - **Decision tree classifier**: To ensure the interpretability at the back - end, a decision tree classifier is used for downstream tasks (spoof detection and attack source - tracing). - **Shapley value analysis**: The Shapley value is used to quantify the contribution of each attribute to the spoof detection and attack source - tracing tasks, thus providing a more in - depth understanding. #### Experimental results: - Experiments were carried out on the ASVspoof2019 dataset, and the results showed that the performance of probabilistic attribute embedding in spoof detection and attack source - tracing tasks is comparable to that of the original CM embedding, and even better in some cases. - The best spoof detection accuracy rate is 99.7%, and the attack source - tracing accuracy rate is 99.2%. - Shapley value analysis shows that the attributes related to acoustic feature prediction, waveform generation and speaker modeling are crucial for spoof detection; while duration modeling, waveform generation and input type are more important for attack source - tracing. In conclusion, this paper improves the interpretability and accuracy of spoofed speech detection and attack source - tracing by introducing probabilistic attribute embedding and Shapley value analysis.