Abstract:Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. Firstly, the audio signal is band-pass filtered to obtain the ENF component. Then, the discrete Fourier transform (DFT) and Hilbert transform are performed to obtain the phase and instantaneous frequency of the ENF component. Secondly, the mean value of the sequence variation is used as the shallow feature; the feature matrix obtained by framing and reshaping of the ENF sequence is used as the input of the convolutional neural network; the characteristics of the fitted coefficients are obtained by curve fitting. Then, the local details of ENF are obtained from the feature matrix by the convolutional neural network, and the global information of ENF is obtained by fitting coefficient features through deep neural network (DNN). The depth features of ENF are composed of ENF global information and local information together. The shallow and deep features are fused using an attention mechanism to give greater weights to features useful for classification and suppress invalid features. Finally, the tampered audio is detected by downscaling and fitting with a DNN containing two fully connected layers, and classification is performed using a Softmax layer. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.

A Robust Deep Audio Splicing Detection Method Via Singularity Detection Feature.

Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion

Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks

An Audio Copy-Move Forgery Localization Model by CNN-Based Spectral Analysis

Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture

Acoustic features analysis for explainable machine learning-based audio spoofing detection

Robust Audio Anti-Spoofing System Based on Low-Frequency Sub-Band Information

A lightweight feature extraction technique for deepfake audio detection

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection

Shallow and deep feature fusion for digital audio tampering detection

Source Tracing of Audio Deepfake Systems

End-to-end Image Splicing Localization Based on Multi-Scale Features and Residual Refinement Module

Audio Anti-Spoofing Detection: A Survey

Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

Efficient Deepfake Audio Detection Using Spectro-Temporal Analysis and Deep Learning

Self-Supervised Spoofing Audio Detection Scheme.

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

AI-Synthesized Voice Detection Using Neural Vocoder Artifacts

Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

Feature Aggregation and Region-Aware Learning for Detection of Splicing Forgery

Robust copy-move detection and localization of digital audio based CFCC feature