A Unique Identification-Oriented Black-Box Watermarking Scheme for Deep Classification Neural Networks

Mouke Mo,Chuntao Wang,Shan Bian
DOI: https://doi.org/10.3390/sym16030299
2024-03-05
Symmetry
Abstract:Given the substantial value and considerable training costs associated with deep neural network models, the field of deep neural network model watermarking has come to the forefront. While black-box model watermarking has made commendable strides, the current methodology for constructing poisoned images in the existing literature is simplistic and susceptible to forgery. Notably, there is a scarcity of black-box model watermarking techniques capable of discerning a unique user in a multi-user model distribution setting. For this reason, this paper proposes a novel black-box model watermarking method for unique identity identification, which is denoted as the ID watermarking of neural networks (IDwNet). Specifically, to enhance the distinguishability of deep neural network models in multi-user scenarios and mitigate the likelihood of poisoned image counterfeiting, this study develops a discrete cosine transform (DCT) and singular value decomposition (SVD)-based symmetrical embedding method to form the poisoned image. As this ID embedding method leads to indistinguishable deep features, the study constructs a poisoned adversary training strategy by simultaneously inputting clean images, poisoned images with the correct ID, and poisoned adversary images with incorrect IDs to train a deep neural network. Extensive simulation experiments show that the proposed scheme achieves excellent invisibility for the concealed ID, surpassing remarkably the state-of-the-art. In addition, the proposed scheme obtains a validation success rate exceeding 99% for the poisoned images at the cost of a marginal classification accuracy reduction of less than 0.5%. Moreover, even though there is only a 1-bit discrepancy between IDs, the proposed scheme still results in an accurate validation of user copyright. These results indicate that the proposed scheme is promising.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the current black - box model watermarking techniques are too simple in constructing poisoned images and are easily forged, especially the lack of black - box model watermarking techniques that can distinguish unique users in the multi - user model distribution setting. Specifically, the paper aims to propose a new black - box model watermarking method for the unique identity recognition of deep classification neural networks, called IDwNet. This method solves the existing problems in the following ways: 1. **Enhancing non - forgerability**: By using a symmetric embedding method that combines the discrete cosine transform (DCT) and the singular value decomposition (SVD) to construct poisoned images, making the poisoned images more concealed and difficult to be detected by human observers, thus reducing the possibility of forging. 2. **Improving the unique identification ability in a multi - user environment**: By introducing poisoned adversarial images with wrong IDs for training, the model's ability to distinguish different IDs is enhanced, and user copyright can be accurately verified even if there is only a 1 - bit difference between IDs. 3. **Maintaining high fidelity and reliability**: While embedding the watermark, ensure that the classification prediction accuracy of the model is not significantly affected, and the watermark can be efficiently extracted and verified. 4. **Improving robustness**: Ensure that the watermark is not easily removed or destroyed maliciously. If an attacker attempts to forcibly remove or destroy the watermark, it will seriously affect the accuracy of the model. The main contributions of the paper include: - Proposing the concept of black - box ID model watermarking, which can uniquely identify users in a multi - user distribution framework. - Designing a method for constructing poisoned images based on digital watermarking techniques, which can dynamically adjust embedding parameters, enhance the concealment of poisoned images, and maintain the fidelity and reliability of black - box model watermarks. - Introducing a poisoned adversarial image training strategy to enhance the model's ability to recognize the embedded ID, thereby effectively distinguishing the owner of the model. Through these methods, the experimental results of the IDwNet scheme proposed in the paper on multiple datasets show that this method performs excellently in the field of black - box model watermarking and has a high degree of non - forgerability, fidelity and reliability.