Abstract:We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the interpretability problems in spoofed speech detection and attack source - tracing. Specifically, the authors propose a new method to represent spoofed speech through interpretable probabilistic attribute embedding. Traditional methods usually rely on high - dimensional raw embeddings, which are difficult to interpret, while the method in this paper introduces probabilistic attributes to evaluate the presence or absence of sub - components that constitute a specific spoofing attack. #### Main problems: 1. **Interpretability of spoofed speech detection**: Although existing counter - measures (CMs) have good performance, they are usually black - box models and lack the ability to explain the decision - making process. This is especially important in critical fields such as forensics. 2. **Accuracy of spoofing attack source - tracing**: Besides distinguishing between real and spoofed speech, it is also necessary to identify specific spoofing methods and their components to improve the accuracy of attack source - tracing. #### Solutions: - **Probabilistic attribute embedding**: A series of probabilistic attributes are designed to measure the presence or absence of each module (such as acoustic feature prediction, waveform generation, speaker modeling, etc.) in the spoofed speech generation process. - **Decision tree classifier**: To ensure the interpretability at the back - end, a decision tree classifier is used for downstream tasks (spoof detection and attack source - tracing). - **Shapley value analysis**: The Shapley value is used to quantify the contribution of each attribute to the spoof detection and attack source - tracing tasks, thus providing a more in - depth understanding. #### Experimental results: - Experiments were carried out on the ASVspoof2019 dataset, and the results showed that the performance of probabilistic attribute embedding in spoof detection and attack source - tracing tasks is comparable to that of the original CM embedding, and even better in some cases. - The best spoof detection accuracy rate is 99.7%, and the attack source - tracing accuracy rate is 99.2%. - Shapley value analysis shows that the attributes related to acoustic feature prediction, waveform generation and speaker modeling are crucial for spoof detection; while duration modeling, waveform generation and input type are more important for attack source - tracing. In conclusion, this paper improves the interpretability and accuracy of spoofed speech detection and attack source - tracing by introducing probabilistic attribute embedding and Shapley value analysis.

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Speaker-Aware Anti-Spoofing

How to Boost Anti-Spoofing with X-Vectors.

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification

Interpretable Temporal Class Activation Representation for Audio Spoofing Detection

Explainable Attribute-Based Speaker Verification

An explainable deepfake of speech detection method with spectrograms and waveforms

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

A blended framework for audio spoof detection with sequential models and bags of auditory bites

Towards single integrated spoofing-aware speaker verification embeddings

Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs

Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation