AMP-Diffusion: Integrating Latent Diffusion with Protein Language Models for Antimicrobial Peptide Generation

Tianlai Chen,Pranay Vure,Rishab Pulugurta,Pranam Chatterjee
DOI: https://doi.org/10.1101/2024.03.03.583201
2024-03-06
Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a potent class of generative models, demonstrating exemplary performance across diverse AI domains such as computer vision and natural language processing. In the realm of protein design, while there have been advances in structure-based, graph-based, and discrete sequence-based diffusion, the exploration of continuous latent space diffusion within protein language models (pLMs) remains nascent. In this work, we introduce AMP-Diffusion, a latent space diffusion model tailored for antimicrobial peptide (AMP) design, harnessing the capabilities of the state-of-the-art pLM, ESM-2, to generate functional AMPs for downstream experimental application. Our evaluations reveal that peptides generated by AMP-Diffusion align closely in both pseudo-perplexity and amino acid diversity when benchmarked against experimentally-validated AMPs, and further exhibit relevant physicochemical properties similar to these naturally-occurring sequences. Overall, these findings underscore the biological plausibility of our generated sequences and pave the way for their empirical validation. In total, our framework motivates future exploration of pLM-based diffusion models for peptide and protein design.
Bioinformatics
What problem does this paper attempt to address?
The paper mainly discusses how to use deep learning methods to generate antimicrobial peptides (AMPs) with antibacterial functionality. Currently, although there has been progress in diffusion models based on structure, graph, and sequence, the study of continuous latent space diffusion within protein language models (pLMs) is still in its early stages. The paper proposes AMP-Diffusion, a latent space diffusion model specifically designed for AMPs, which combines the capabilities of the state-of-the-art pLM ESM-2 to generate functional AMPs from scratch for experimental validation. The AMP-Diffusion model introduces Gaussian noise into the latent representation of protein sequences, and then reconstructs the sequences by removing this noise through a reverse process. After training, the generated AMPs perform well in terms of pseudoperplexity, amino acid diversity, and similarity to experimentally validated AMPs. Furthermore, these AMPs also exhibit similar physicochemical properties to natural AMPs, such as charge, hydrophobicity, and isoelectric point, indicating the biological rationality of the generated sequences. The paper evaluates the performance of AMP-Diffusion compared to other models such as HydrAMP, PepCV AE, and AMPGAN, and demonstrates superiority across multiple metrics. Through the external classifier HydrAMP, the peptide segments generated by AMP-Diffusion also show high potential in terms of the probability of being classified as antimicrobial peptides and the likelihood of activity against Escherichia coli. In conclusion, this work fills the gap between pLMs and diffusion model integration, providing a new tool for AMP design and opening up new directions for future protein design research, including experimental validation of the activity of generated peptide segments, customization of peptide segments with specific properties, and extension of this method to a broader range of protein engineering applications.