Backend Ensemble for Speaker Verification and Spoofing Countermeasure

Li Zhang,Yue Li,Huan Zhao,Qing Wang,Lei Xie

DOI: https://doi.org/10.48550/arXiv.2207.01802

2022-09-23

Abstract:This paper describes the NPU system submitted to Spoofing Aware Speaker Verification Challenge 2022. We particularly focus on the \textit{backend ensemble} for speaker verification and spoofing countermeasure from three aspects. Firstly, besides simple concatenation, we propose circulant matrix transformation and stacking for speaker embeddings and countermeasure embeddings. With the stacking operation of newly-defined circulant embeddings, we almost explore all the possible interactions between speaker embeddings and countermeasure embeddings. Secondly, we attempt different convolution neural networks to selectively fuse the embeddings' salient regions into channels with convolution kernels. Finally, we design parallel attention in 1D convolution neural networks to learn the global correlation in channel dimensions as well as to learn the important parts in feature dimensions. Meanwhile, we embed squeeze-and-excitation attention in 2D convolutional neural networks to learn the global dependence among speaker embeddings and countermeasure embeddings. Experimental results demonstrate that all the above methods are effective. After fusion of four well-trained models enhanced by the mentioned methods, the best SASV-EER, SPF-EER and SV-EER we achieve are 0.559\%, 0.354\% and 0.857\% on the evaluation set respectively. Together with the above contributions, our submission system achieves the fifth place in this challenge.

Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively detect spoofing attacks in Automatic Speaker Verification (ASV) systems, especially for Text - to - Speech (TTS) and Voice Conversion (VC) spoofing attacks in the Logical Access (LA) scenario. Specifically, the paper focuses on the back - end integration scheme, aiming to improve the performance of speaker verification and spoofing detection in the following three aspects: 1. **Embedding fusion method**: Besides simple concatenation, it is proposed to use cyclic matrix transformation and stacking operations to fuse speaker embeddings and anti - spoofing embeddings, in order to explore as many interactions between them as possible. 2. **Different convolutional neural network frameworks**: Try to use different Convolutional Neural Networks (CNN) to selectively fuse different salient regions of the embeddings into channels. 3. **Attention mechanism design**: Introduce parallel attention in 1D CNN and embed Squeeze - and - Excitation (SE) attention in 2D CNN to learn global correlations and important parts in the feature dimension. Through these methods, the paper aims to build a back - end integration system that can extract more effective information from speaker embeddings and anti - spoofing embeddings, thereby improving the accuracy of speaker verification and the reliability of spoofing detection. Experimental results show that these methods have achieved significant improvements in multiple evaluation metrics, and finally achieved the fifth place in the Spoofing Aware Speaker Verification Challenge 2022.

Backend Ensemble for Speaker Verification and Spoofing Countermeasure

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

Norm-constrained Score-level Ensemble for Spoofing Aware Speaker Verification

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

Spoofing-Aware Speaker Verification by Multi-Level Fusion

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference.

Towards single integrated spoofing-aware speaker verification embeddings

SASV Based on Pre-trained ASV System and Integrated Scoring Module

USTC-KXDIGIT System Description for ASVspoof5 Challenge

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification

The Vicomtech Spoofing-Aware Biometric System for the SASV Challenge

Voice Presentation Attack Detection Using Convolutional Neural Networks