A Cross-Database Study of Voice Presentation Attack Detection

Pavel Korshunov,Sébastien Marcel
DOI: https://doi.org/10.1007/978-3-319-92627-8_16
2019-01-01
Abstract:Despite an increasing interest in speaker recognition technologies, a significant obstacle still hinders their wide deployment—their high vulnerability to spoofing or presentation attacks. These attacks can be easy to perform. For instance, if an attacker has access to a speech sample from a target user, he/she can replay it using a loudspeaker or a smartphone to the recognition system during the authentication process. The ease of executing presentation attacks and the fact that no technical knowledge of the biometric system is required to make these attacks especially threatening in practical application. Therefore, late research focuses on collecting data databases with such attacks and on development of presentation attack detection (PAD)Presentation Attack Detection (PAD) systems. In this chapter, we present an overview of the latest databases and the techniques to detect presentation attacks. We consider several prominent databases that contain bona fide and attack data, including ASVspoof 2015, ASVspoof 2017, AVspoof, voicePA, and BioCPqD-PA (the only proprietary database). Using these databases, we focus on the performance of PAD systems in the cross-database Cross-database scenario in the presence of “unknown” (not available during training) attacks, as these scenarios are closer to practice, when pretrained systems need to detect attacks in unforeseen conditions. We first present and discuss the performance of PAD systems based on handcrafted features and traditional Gaussian mixture model (GMM) classifiers. We then demonstrate whether the score fusion techniques can improve the performance of PADs. We also present some of the latest results of using neural networks for presentation attack detection. The experiments show that PAD systems struggle to generalize across databases and mostly unable to detect unknown Unknown attacks, with systems based on neural networks demonstrating better performance compared to the systems based on handcrafted features.
What problem does this paper attempt to address?