Abstract:Watermarking deep neural network (DNN) models has attracted a great deal of attention and interest in recent years because of the increasing demand to protect the intellectual property of DNN models. Many practical algorithms have been proposed by covertly embedding a secret watermark into a given DNN model through either parametric/structural modulation or backdooring against intellectual property infringement from the attacker while preserving the model performance on the original task. Despite the performance of these approaches, the lack of basic research restricts the algorithmic design to either a trial-based method or a data-driven technique. This has motivated the authors in this paper to introduce a game between the model attacker and the model defender for trigger-based black-box model watermarking. For each of the two players, we construct the payoff function and determine the optimal response, which enriches the theoretical foundation of model watermarking and may inspire us to develop novel schemes in the future.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the intellectual property protection of deep neural network (DNN) models, especially the lack of theoretical basis for black - box model watermarking techniques. Specifically: 1. **The need for intellectual property protection of DNN models**: With the wide application of DNN in various fields, creating an advanced DNN model requires a large amount of labeled data, professional knowledge, and powerful computing resources. Therefore, it is very urgent to prevent DNN models from being illegally tampered with and sold. 2. **Limitations of existing watermarking methods**: Most of the existing DNN watermarking methods focus on practical design and lack theoretical research on black - box model watermarking. This limits the algorithm design to rely only on trial - and - error methods or data - driven techniques and cannot provide a deeper understanding and optimization. 3. **Introduction of the game - theory framework**: To fill this theoretical gap, the author introduced a framework based on game theory to simulate the confrontation between the model defender (Defender) and the attacker (Attacker). By constructing the payoff function of each participant and determining the optimal response strategy, it enriches the theoretical basis of model watermarking and provides theoretical support for the development of new watermarking schemes in the future. ### Specific problem description - **Challenges of black - box model watermarking**: Black - box model watermarking assumes that the watermark extractor does not know the internal details of the target model and can only extract the watermark by interacting with the model. In this case, how to ensure the effectiveness of the watermark and the stability of the model performance is an important issue. - **Application of game theory**: By modeling the black - box model watermarking problem as a game process, the dynamic relationship between the defender and the attacker can be better understood. The defender's purpose is to embed the watermark without affecting the original task performance of the model, while the attacker's purpose is to destroy or remove the watermark. ### Main contributions of the paper - **Theoretical framework**: A partial - cooperation game framework is proposed, which combines the elements of competition and cooperation between the defender and the attacker, expanding the analytical perspective of adversarial machine learning. - **Optimal strategy analysis**: Through mathematical derivation, the optimal response strategies of the defender and the attacker are determined, revealing the relationship between the robustness of different watermarking models and different attack intensities. - **Practical application prospects**: It provides theoretical guidance for future DNN model watermarking design and emphasizes the importance of enhancing the robustness of watermarking models in real - world attacks. In conclusion, this paper aims to solve the problem of the lack of theoretical basis in the existing black - box model watermarking techniques by introducing a game - theory framework and provide new ideas and methods for improving the intellectual property protection ability of DNN models.

A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking