Enhancing Biosecurity with Watermarked Protein Design

Yanshuo Chen,Zhengmian Hu,Yihan Wu,Ruibo Chen,Yongrui Jin,Wei Chen,Heng Huang
DOI: https://doi.org/10.1101/2024.05.02.591928
2024-05-05
Abstract:The biosecurity issue arises as the capability of deep learning-based protein design has rapidly increased in recent years. To address this problem, we propose a new general framework for adding watermarks to protein sequences designed by various sampling-based deep learning models. Compared to currently proposed protein design regulation procedures, watermarks ensure robust traceability and maintain the privacy of protein sequences. Moreover, using our framework does not decrease the performance or accessibility of the protein design tools.
Bioinformatics
What problem does this paper attempt to address?
This paper focuses on the issue of biosecurity, particularly in the context of enhancing protein design capabilities driven by deep learning. With the popularization of these tools, there is a possibility of designing and synthesizing harmful proteins, thereby causing biosecurity threats. To address this problem, the paper proposes a new framework, i.e., adding watermarks to protein sequences, to enhance biosecurity, protect privacy, and ensure traceability. The current methods mainly regulate by controlling DNA synthesis steps and recording all sequences, but this approach can compromise the privacy of the designed protein sequences. Although the encryption framework proposed by the SecureDNA Foundation mentioned in the paper aims to protect privacy, it lacks traceability. In contrast, the protein watermark framework proposed in the paper allows researchers to obtain private keys from authoritative institutions and then use the private keys to add watermarks to the sequences generated by protein design tools. In this way, DNA synthesis can be verified locally without uploading sequence information, and the designer of the sequence can be traced through the watermark, thus strengthening biosecurity. The watermarking framework described in the paper is based on unbiased watermarking technology, which does not affect the performance of the model, and only individuals with the correct private key can detect the watermark. Through experiments, it has been proven that the watermark has no impact on the performance of the designed protein sequences and can still maintain its traceability when the sequence undergoes mutations. Additionally, the framework allows researchers to assert intellectual property rights over the designed protein sequences. In conclusion, the paper aims to solve the problems of biosecurity, privacy protection, and sequence traceability by embedding watermarks in protein design, providing a new and practical regulatory strategy.