Detection on PSOLA-modified Voices by Seeking out Duplicated Fragments

Yifeng Shen,Jia,Lianhong Cai
DOI: https://doi.org/10.1109/icsai.2012.6223483
2012-01-01
Abstract:Pitch Synchronous Overlap-Add (PSOLA) refers to a family of signal processing techniques widely used for prosodic modification. They can be used to modify one person's voice by altering prosodic characteristics of speech, making the voice unrecognizable or unidentifiable. Well-modified voices may even make the speaker recognition process, which is critical in digital audio forensic framework, out of work. Time-domain PSOLA (TD-PSOLA) is the most popular algorithm in PSOLA family. Time- and pitch-scaling form of modifications can be applied by TD-PSOLA, and the synthesis quality is extremely high provided that the modifications do not exceed a factor of two. Our paper presents a simple method to figure out whether a given speech waveform is modified or not by the TD-PSOLA algorithm. Seeking out duplicated fragments from time domain of the waveform, we extract the occurrence number of duplicated fragments as well as occurrence frequency in voiced portions of speech. A single feature (duplicated fragments density, DFD) is then calculated, and compared with a threshold (obtained from plenty of former statistic results) to decide whether the questioned speech waveform is modified. Experimental results demonstrate the effectiveness of our method in detecting modified voices, which are pitch heightened and/or duration lengthened using the TD-PSOLA algorithm.
What problem does this paper attempt to address?