Abstract:Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.

Vocal tract characteristic on long-term formant distribution

Study of Long-Term Formant Distributions in Forensic Phonetics

Recording Device Identification Based on Cepstral Mixed Features

Forensic Speech Enhancement Based on Two-Dimensional Fractional Fourier Transform Domain

VocalLock

Toward Pitch-Insensitive Speaker Verification Via Soundfield

The Catcher in the Field

Speech Length Threshold in Forensic Speaker Comparison by Using Long-Term Cumulative Formant (LTCF) Analysis

Forensic Speech Information Hiding Using Fractional Cosine-Cepstrum Transform

Fusing linguistic and acoustic information for automated forensic speaker comparison

An Empirical Study of the Effects of Pure Real-World Conditions on the Reliability of Forensic Phonetic Features

How Distinguishable Are Vocoder Models? Analyzing Vocoder Fingerprints for Fake Audio

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

A Hybrid Method for Acoustic Analysis of the Vocal Tract During Vowel Production.

Chinese Speech Feature Analysis and Recognition Based on Sinusoidal Model

Influence of recording system on voiceprint recognition

Impact of Naturalistic Field Acoustic Environments on Forensic Text-independent Speaker Verification System

Correlations Between Vocal Tract Parameters and Body Heights in Adult Humans

A Study of Mandarin Chinese Using X-Ray and MRI

Detection of Operation Type and Order for Digital Speech