Abstract:Protein structure is key to understanding protein function and is essential for progress in bioengineering, drug discovery, and molecular biology. Recently, with the incorporation of generative AI, the power and accuracy of computational protein structure prediction/design have been improved significantly. However, ethical concerns such as copyright protection and harmful content generation (biosecurity) pose challenges to the wide implementation of protein generative models. Here, we investigate whether it is possible to embed watermarks into protein generative models and their outputs for copyright authentication and the tracking of generated structures. As a proof of concept, we propose a two-stage method FoldMark as a generalized watermarking strategy for protein generative models. FoldMark first pretrain watermark encoder and decoder, which can minorly adjust protein structures to embed user-specific information and faithfully recover the information from the encoded structure. In the second step, protein generative models are fine-tuned with watermark Low-Rank Adaptation (LoRA) modules to preserve generation quality while learning to generate watermarked structures with high recovery rates. Extensive experiments are conducted on open-source protein structure prediction models (e.g., ESMFold and MultiFlow) and de novo structure design models (e.g., FrameDiff and FoldFlow) and we demonstrate that our method is effective across all these generative models. Meanwhile, our watermarking framework only exerts a negligible impact on the original protein structure quality and is robust under potential post-processing and adaptive attacks.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are: in the context of the rapid development of generative AI technology, how to protect the copyright of protein - generation models and ensure that they will not be misused. Specifically, the paper focuses on: 1. **Copyright protection**: With the widespread sharing and use of protein - generation models, the unauthorized use of generated structures or the redistribution of pre - trained models for profit - making purposes is increasing, which harms the interests of the original creators. 2. **Biosafety**: Powerful protein - generation models are prone to misuse. For example, new proteins with harmful properties (such as pathogens, toxins or viruses) can be designed, which may be used as biological weapons, thus causing biosafety problems. To solve these problems, the paper proposes a general watermarking method named FoldMark, which aims to embed watermarks into protein - generation models and their outputs to achieve copyright verification and tracking of generated structures. FoldMark achieves this goal in the following ways: - **Two - stage method**: - **First stage**: Pre - train SE(3)-equivariant watermark encoders and decoders to learn how to embed watermark information without compromising the structural quality. - **Second stage**: Introduce the watermark low - rank adaptation (LoRA) module to fine - tune the protein - generation model so that it can generate structures with high - recovery - rate watermarks while maintaining the generation quality. Through this method, FoldMark can reliably embed and extract watermark information without affecting the quality of protein structures, thus providing an effective copyright protection and tracking mechanism for protein - generation models. Experimental results show that FoldMark performs well on a variety of protein - generation models and is robust against post - processing and adaptive attacks.

FoldMark: Protecting Protein Generative Models with Watermarking

FoldMark: Protecting Protein Generative Models with Watermarking

Warfare:Breaking the Watermark Protection of AI-Generated Content

Enhancing Biosecurity with Watermarked Protein Design

Suppressing High-Frequency Artifacts for Generative Model Watermarking by Anti-Aliasing

WAPITI: A Watermark for Finetuned Open-Source LLMs

Generative Model Watermarking Suppressing High-Frequency Artifacts

A Novel Model Watermarking for Protecting Generative Adversarial Network

Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

A Watermark for Large Language Models

Robust Model Watermarking for Image Processing Networks via Structure Consistency

Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Watermarking

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges

Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs

Hide and Seek: How Does Watermarking Impact Face Recognition?

Exploring Structure Consistency for Deep Model Watermarking