LRPD: Large Replay Parallel Dataset

Ivan Yakovlev,Mikhail Melnikov,Nikita Bukhal,Rostislav Makarov,Alexander Alenin,Nikita Torgashov,Anton Okhotnikov
DOI: https://doi.org/10.1109/ICASSP43922.2022.9746527
2023-09-29
Abstract:The latest research in the field of voice anti-spoofing (VAS) shows that deep neural networks (DNN) outperform classic approaches like GMM in the task of presentation attack detection. However, DNNs require a lot of data to converge, and still lack generalization ability. In order to foster the progress of neural network systems, we introduce a Large Replay Parallel Dataset (LRPD) aimed for a detection of replay attacks. LRPD contains more than 1M utterances collected by 19 recording devices in 17 various environments. We also provide an example training pipeline in PyTorch [1] and a baseline system, that achieves 0.28% Equal Error Rate (EER) on evaluation subset of LRPD and 11.91% EER on publicly available ASVpoof 2017 [2] eval set. These results show that model trained with LRPD dataset has a consistent performance on the fully unknown conditions. Our dataset is free for research purposes and hosted on GDrive. Baseline code and pre-trained models are available at GitHub.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?