Abstract:The majority of Digital Audio Tampering Detection (DATD) methods, which are based on Electrical Network Frequency (ENF), predominantly concentrate on the static spatial information of ENF. Unfortunately, this focus neglects the temporal variation present in the ENF time series. This limitation significantly hampers the ENF feature representation capability, consequently diminishing the overall accuracy of tampering detection. To address this gap, our paper introduces an innovative digital audio tampering detection method founded on ENF spatio-temporal feature representation learning. To enhance the feature representation capability and subsequently improve tampering detection accuracy, we propose the construction of a parallel spatio-temporal network model. This model incorporates both Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) network architectures. Through this hybrid model, we aim to deeply extract both ENF spatial and temporal feature information. In the process of extracting spatial and temporal features of ENF, we utilize high-precision Discrete Fourier Transform (DFT) analysis on digital audio. This analysis allows us to extract ENF phase sequences, which are then adaptively divided into frames through frame shifting. The result is feature matrices of uniform size, effectively representing the spatial features of ENF. Concurrently, phase sequences are segmented into frames based on ENF time changes to capture the temporal features of ENF. Subsequently, deep spatial and temporal features are extracted using CNN and BiLSTM, respectively. To further enhance the representation capability of the spatio-temporal features, we introduce an attention mechanism. This mechanism dynamically assigns weights to the deep spatial and temporal features, providing a nuanced and refined representation. Finally, a deep neural network is employed to discern whether the audio has undergone tampering. Our experimental results validate the effectiveness of our approach, showcasing superior performance compared to six state-of-the-art methods across three public databases for digital audio tampering detection. This comprehensive methodology, focusing on both spatial and temporal aspects of ENF, establishes a robust foundation for advancing the field of DATD and contributes significantly to improving detection accuracy.

Using Deep Belief Network to Capture Temporal Information for Audio Event Classification.

Audio Event Recognition Based on DBN Features from Multiple Filter-Bank Representations.

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

Auditory Scene Classification with Deep Belief Network.

Bipolar Population Threshold Encoding for Audio Recognition with Deep Spiking Neural Networks

Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

Adaptive DCTNet for Audio Signal Classification

Learning Long-Term Filter Banks for Audio Source Separation and Audio Scene Classification

Deep Neural Network Derived Bottleneck Features For Accurate Audio Classification

Deep Neural Network Based Environment Sound Classification and Its Implementation on Hearing Aid App

Temporal Coding of Local Spectrogram Features for Robust Sound Recognition

Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition

Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks

Deep Belief Networks Based Voice Activity Detection

Hierarchical-Concatenate Fusion TDNN for sound event classification

Balanced Deep CCA for Bird Vocalization Detection

Multi-level Fusion of Audio and Visual Features for Speaker Identification

Multi-mode Study of Deep Learning Applications in Acoustic Signal Processing

Deep Learning Applied to Dereverberation and Sound Event Classification in Reverberant Environments

Digital audio tampering detection based on spatio-temporal representation learning of electrical network frequency

Learning Temporal Resolution in Spectrogram for Audio Classification