Abstract:Deep speaker embedding extractors have already become new state-of-the-art systems in the speaker verification field. However, the problem of verification score calibration for such systems often remains out of focus. An irrelevant score calibration leads to serious issues, especially in the case of unknown acoustic conditions, even if we use a strong speaker verification system in terms of threshold-free metrics. This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagnetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approach based on separate scale and offset prediction neural networks. An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system. The obtained results demonstrate that there are no serious problems if in-domain development data are used for calibration tuning. Otherwise, a trade-off between good calibration performance and threshold-free system quality arises. In most cases using adaptive s-norm helps to stabilize score distributions and to improve system performance. Meanwhile, some experiments demonstrate that novel approaches have their limits in score stabilization on several datasets.

The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification

The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

Adaptive Large Margin Fine-Tuning for Robust Speaker Verification

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio

The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023

Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker Verification

Large Margin Softmax Loss for Speaker Verification

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022

The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022

Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification

UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

Learning Discriminative Speaker Embedding by Improving Aggregation Strategy and Loss Function for Speaker Verification

DeltaVLAD: an Efficient Optimization Algorithm to Discriminate Speaker Embedding for Text-Independent Speaker Verification