Abstract:In the expanding field of generative artificial intelligence, integrating robust watermarking technologies is essential to protect intellectual property and maintain content authenticity. Traditionally, watermarking techniques have been developed primarily for rich information media such as images and audio. However, these methods have not been adequately adapted for graph-based data, particularly molecular graphs. Latent 3D graph diffusion(LDM-3DG) is an ascendant approach in the molecular graph generation field. This model effectively manages the complexities of molecular structures, preserving essential symmetries and topological features. We adapt the Gaussian Shading, a proven performance lossless watermarking technique, to the latent graph diffusion domain to protect this sophisticated new technology. Our adaptation simplifies the watermark diffusion process through duplication and padding, making it adaptable and suitable for various message types. We conduct several experiments using the LDM-3DG model on publicly available datasets QM9 and Drugs, to assess the robustness and effectiveness of our technique. Our results demonstrate that the watermarked molecules maintain statistical parity in 9 out of 10 performance metrics compared to the original. Moreover, they exhibit a 100% detection rate and a 99% extraction rate in a 2D decoded pipeline, while also showing robustness against post-editing attacks.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is, in the field of generative artificial intelligence, especially for molecular graph data, how to protect intellectual property rights and maintain the authenticity of content through effective watermarking techniques. Traditional watermarking techniques are mainly applied to rich media such as images and audio, but these methods have not been fully adapted to graph data, especially molecular graph data. The paper proposes a graph watermarking technique based on Gaussian Shading. This technique simplifies the copying and filling steps in the watermark diffusion process, enabling it to adapt to different - length message types, maintaining the consistency of statistical performance in molecular graph generation, and showing robustness against post - editing attacks.
### Specific Problems and Solutions:
1. **Problem**: Existing watermarking techniques are mainly applicable to rich media such as images and audio, but are insufficiently applied to graph data, especially molecular graph data.
- **Solution**: The paper proposes a method of adapting the Gaussian Shading technique to graph data. By embedding watermarks in the latent space, this method can achieve efficient watermark embedding and extraction while maintaining the quality of the generated molecules.
2. **Problem**: Molecular graph generation models need to protect the content they generate from unauthorized use to prevent intellectual property infringement and misinformation dissemination.
- **Solution**: The paper develops a lightweight watermark embedding and detection framework (GUISE), which can embed and detect watermarks without significantly affecting the model performance. Experimental results show that the molecules with watermark embedding are consistent with the original molecules in multiple performance indicators, and have a 100% detection rate and a 99% extraction rate.
3. **Problem**: The generated molecular graphs need to be robust against various attacks to ensure the effectiveness and durability of the watermarks.
- **Solution**: The paper designs a series of attack experiments for molecular graphs, including modifications to the molecular topological structure. Experimental results show that even under attack, the watermarks can still be effectively detected and extracted, proving the robustness of this method.
### Main Contributions:
1. **Successfully adapt the Gaussian Shading technique to the LDM - 3DG molecular graph diffusion model**, expanding the application range of this technique.
2. **Evaluate the impact of watermark embedding on model performance through multiple performance indicators**, and the results show that watermark embedding does not damage the model performance.
3. **Introduce unique attack methods in the field of graph data generation for the first time**, and verify the robustness of the watermarking technique under these attacks.
### Experimental Setup and Results:
- **Dataset**: The paper uses two public datasets, QM9 and GEOM - drugs, for experiments, which contain approximately 134,000 small organic molecules and 450,000 larger molecules respectively.
- **Evaluation Indicators**: Including the chemical validity, uniqueness, atomic stability and molecular stability of the generated molecules, etc.
- **Experimental Results**: The molecules with watermark embedding are consistent with the original molecules in multiple performance indicators, with no statistically significant differences. In addition, the watermark detection rate reaches 100%, the extraction rate reaches 99%, and shows good robustness under various attacks.
Through these studies, the paper provides an effective solution for intellectual property protection in the field of graph data generation.