Toward Pitch-Insensitive Speaker Verification Via Soundfield
Xinfeng Li,Zhicong Zheng,Chen Yan,Chaohao Li,Xiaoyu Ji,Wenyuan Xu
DOI: https://doi.org/10.1109/jiot.2023.3290001
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Automatic speaker verification systems (ASVs) verify a person’s identity by his/her voice and have been widely deployed for user authentication. However, existing ASVs are based on traditional audio spectral features and hence, perform poorly in verifying pitch-changed utterances from speakers with cold or sore throat. In this article, we propose soundfield tracker (SOFTER) , a soundfield-based speaker verification system that can verify speakers regardless of the pitch changes. SOFTER is based on the observation that soundfield features reflect the speaker’s vocal tract, mouth, head, torso, etc., which are less affected by the pitch changes in speech signals. SOFTER can be integrated into off-the-shelf smartphones without any hardware modifications. One major challenge is that the soundfield is sensitive to the distance between the speaker and the phone. To solve this problem, we propose a two-stage mechanism combining distance sensing and soundfield reconstruction, which enables to reconstruct the soundfield to a setting similar to the one in the enrollment phase, thus, the speaker can be verified from any distance to the phone. We compare SOFTER with six state-of-the-art academic and commercial ASVs on two data sets of 134 speakers and 31000 speech samples. Results show that SOFTER has an equal error rate (EER) of 2.18% and 1.61% on the two data sets, respectively. Moreover, SOFTER outperforms other ASVs by at least 24.67% on average in verifying pitch-varying or pathological speech samples, denoting an evidence of SOFTER ’s effectiveness in both normal and unhealthy user conditions.