Graph Attention Networks for Anti-Spoofing

Hemlata Tak,Jee-weon Jung,Jose Patino,Massimiliano Todisco,Nicholas Evans

DOI: https://doi.org/10.48550/arXiv.2104.03654

2021-04-08

Abstract:The cues needed to detect spoofing attacks against automatic speaker verification are often located in specific spectral sub-bands or temporal segments. Previous works show the potential to learn these using either spectral or temporal self-attention mechanisms but not the relationships between neighbouring sub-bands or segments. This paper reports our use of graph attention networks (GATs) to model these relationships and to improve spoofing detection performance. GATs leverage a self-attention mechanism over graph structured data to model the data manifold and the relationships between nodes. Our graph is constructed from representations produced by a ResNet. Nodes in the graph represent information either in specific sub-bands or temporal segments. Experiments performed on the ASVspoof 2019 logical access database show that our GAT-based model with temporal attention outperforms all of our baseline single systems. Furthermore, GAT-based systems are complementary to a set of existing systems. The fusion of GAT-based models with more conventional countermeasures delivers a 47% relative improvement in performance compared to the best performing single GAT system.

Audio and Speech Processing,Cryptography and Security,Sound

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the detection of spoofing attacks in Automatic Speaker Verification (ASV) systems. Specifically, the paper focuses on how to effectively identify processing artefacts generated by voice signal manipulation or synthesis. These artefacts are usually present in specific spectral sub - bands or time segments. In order to improve the detection performance of these spoofing attacks, the paper proposes to use Graph Attention Networks (GATs) to model the relationships between different sub - bands or time segments, thereby capturing more complex non - linear features and improving the accuracy of spoof detection. The main contribution of the paper lies in using GATs to model the spectral and temporal relationships. It operates on graph - structured data through the self - attention mechanism to better capture the relationships between nodes (i.e., information of specific sub - bands or time segments), thereby improving the performance of spoof detection. Experimental results show that the GAT - based method outperforms multiple baseline systems on the ASVspoof 2019 logical access database, and the performance can be further improved by fusing different systems.

Graph Attention Networks for Anti-Spoofing

Multi-Level Information Aggregation Based Graph Attention Networks Towards Fake Speech Detection

Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection

Spoofing Detection in the Physical Layer with Graph Neural Networks

Self-Attention and MLP Auxiliary Convolution for Face Anti-Spoofing

Spectral Graph Attention Network with Fast Eigen-approximation

Voice Presentation Attack Detection Using Convolutional Neural Networks

How to Boost Anti-Spoofing with X-Vectors.

Improved Lightcnn with Attention Modules for Asv Spoofing Detection

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

Deep Frequent Spatial Temporal Learning for Face Anti-Spoofing

FGDNet: Fine-Grained Detection Network Towards Face Anti-Spoofing

Towards Attention-based Contrastive Learning for Audio Spoof Detection

A Robust graph attention network with dynamic adjusted Graph

Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Speaker-Aware Anti-Spoofing

Twice attention networks for synthetic speech detection