Cochleagram-based Identification of Electronic Disguised Voice with Pitch Scaling in the Noisy Environment.
Wen Dou,Hongxia Wang,Ruixi Yang
DOI: https://doi.org/10.1145/3321408.3326664
2019-01-01
Abstract:Audio editing software makes voice camouflage easily. That threats to the security and authenticity of audio. Whether the audio forensics can identify voice disguised by software has become an important issue. At the same time, since the audio used in daily life always contains noise, the other key point is improving the anti-noise performance. This paper proposed an algorithm on identification of electronic disguised voice with pitch scaling, which has high anti-noise performance. The algorithm is based on Least Mean Square (LMS) filter and cochleagram, an acoustic characteristic which could reflects the auditory features of human ear. In the algorithm, the noisy voice is sent to the LMS filter for noise reduction. Then cochleagram is extracted from the output signal of LMS filter. The cochleagram is handled at different resolution to construct the Least Mean Square-Multi Resolution Cochleagram (LMS-MRCG) feature. the Gaussian Mixture Model-Universe Background Model (GMM-UBM) is used as detection classifier to identify disguised voice. The pitch scaling type contains 5 different pitch for each speaker's voice. In the end the algorithm needs to identify the pitch type of each speaker. The results show that the algorithm has high detection rate Voice with different genders and languages both can be identified. Under the influence of various environmental noises such as Gaussian white noise, pink noise, factory noise, vehicle noise, etc. the algorithm maintains stable identification performance. Especially in low SNR environment, algorithm can maintain high accuracy of forensic classification. In the environment of noise-free, overall identification rate can reach 97.50%. In the low SNR environment as low as -5dB, identification rate can still remain above 85.83%.