Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks

Denise Moussa,Germans Hirsch,Christian Riess
DOI: https://doi.org/10.1007/978-3-031-37742-6_22
2024-05-03
Abstract:Freely available and easy-to-use audio editing tools make it straightforward to perform audio splicing. Convincing forgeries can be created by combining various speech samples from the same person. Detection of such splices is important both in the public sector when considering misinformation, and in a legal context to verify the integrity of evidence. Unfortunately, most existing detection algorithms for audio splicing use handcrafted features and make specific assumptions. However, criminal investigators are often faced with audio samples from unconstrained sources with unknown characteristics, which raises the need for more generally applicable methods. With this work, we aim to take a first step towards unconstrained audio splicing detection to address this need. We simulate various attack scenarios in the form of post-processing operations that may disguise splicing. We propose a Transformer sequence-to-sequence (seq2seq) network for splicing detection and localization. Our extensive evaluation shows that the proposed method outperforms existing dedicated approaches for splicing detection [3, 10] as well as the general-purpose networks EfficientNet [28] and RegNet [25].
Sound,Artificial Intelligence,Computer Vision and Pattern Recognition,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of audio splicing detection and location in practical applications. Most of the current audio splicing detection algorithms rely on hand - designed features and work well on data under specific conditions, but perform poorly when dealing with audio samples from unrestricted sources. These samples usually have unknown characteristics, such as different recording environments or having undergone multiple compression processes. Therefore, an important problem faced by researchers is to develop a more general - purpose method that can effectively detect and locate audio splicing points in various situations. Specifically, the main objectives of the paper include: 1. **Propose a new method**: Use the Transformer sequence - to - sequence (seq2seq) network to detect and locate audio splicing points. This method aims to overcome the limitations of existing methods and provide a more widely applicable solution. 2. **Simulate multiple attack scenarios**: By simulating different post - processing operations (such as MP3 and AMR - NB compression, adding synthetic and real noise, etc.), test the robustness of the method in the face of disguised splicing points. 3. **Evaluate the effectiveness of the method**: Compare with existing methods specifically for audio splicing detection (such as methods based on hand - crafted features and neural network methods) and general - purpose deep - learning models (such as EfficientNet and RegNet) to verify the performance of the proposed method. Through these objectives, the paper aims to promote the development of audio splicing detection technology, making it more suitable for information verification in the public sector and evidence review in the legal field.