An Application-Oriented Taxonomy on Spoofing, Disguise and Countermeasures in Speaker Recognition

Lantian Li,Xingliang Cheng,Thomas Fang Zheng
DOI: https://doi.org/10.1561/116.00000017
2022-01-01
APSIPA Transactions on Signal and Information Processing
Abstract:Speaker recognition aims to recognize the identity of the speaking person. After decades of research, current speaker recognition systems have achieved rather satisfactory performance, and have been deployed in a wide range of practical applications. However, a massive amount of evidence shows that these systems are susceptible to malicious fake actions in real applications. To address this issue, the research community has been responding with dedicated countermeasures which aim to defend against fake actions. Recently, there are several reviews and surveys reported in the literature that describe the current state-of-the-art research advancements. Even so, these reviews and surveys are generally based on a canonical taxonomy to categorize spoofing attacks and corresponding countermeasures from the technology-oriented perspective. This paper provides a new taxonomy from the application-oriented perspective and extends to two major fake forms: spoofing attack and disguise cheating. This taxonomy starts from the applications of speaker recognition technology, e.g., access control, surveillance and forensic, and then rezones two fake forms according to different application scenarios: one is spoofing attack that imitates the voice of an authorized speaker to get access to the target system; the other one is disguise cheating that makes someone unrecognizable by altering his/her voice. Furthermore, for each fake form, more delicate categories and related countermeasures are presented. Finally, this paper discusses future research directions in this area and suggests that the research community should not only focus on the technical view but also connect with application scenarios.
What problem does this paper attempt to address?