Detection of Operation Type and Order for Digital Speech
Tingting Wu,Diqun Yan,Li Xiang,Rangding Wang
DOI: https://doi.org/10.1007/978-981-15-2756-2_3
2019-12-22
Abstract:Most existing speech forensic works implicitly assume the suspected speech either has or has not been processed by a specific operation. In practice, however, the operation type performed on the target speech is usually unknown to the forensic analyst, and in most cases, multiple operations may be involved in order to conceal the forgery trace. Few works have considered these issues. In this study, we propose a universal forensic algorithm that can detect four typical speech operations: pitch shifting, noise-adding, low-pass filtering, and high-pass filtering. The motivation of the proposed algorithm is based on the observation that different operations will cause different effects on Mel-frequency cepstral coefficients (MFCC). The statistical moments of MFCC are extracted as detecting features. Additionally, cepstral mean and variance normalization (CMVN), which is a computationally efficient normalization technique, is used to eliminate the impact of channel noise. Finally, an ensembled binary classifier is used to detect the type of various operations, and multiclass classifiers are adopted to identify the order of operations. The experimental results on the TIMIT and UME-ERJ datasets show that the proposed forensic features achieve good performance on the operation type and order detection. Additionally, the results demonstrate the effectiveness of the proposed algorithm in terms of robustness against the MP3 compression attack.