Detection of circular permutations by Protein Language Models

Yue Hu,Bin Huang,Chunzi Zang
2024-08-06
Abstract:Protein circular permutations are crucial for understanding protein evolution and functionality. Traditional detection methods, sequence-based or structure-based, struggle with accuracy and computational efficiency, the latter also limited by treating proteins as rigid bodies. The plmCP method, utilizing a protein language model, not only speeds up the detection process but also enhances the accuracy of identifying circular permutations, contributing significantly to protein research and engineering by acknowledging structural flexibility.
Quantitative Methods
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of detecting circular permutations in proteins. Specifically: 1. **Research Background**: Circular permutation in proteins is an important phenomenon that reflects the diversity and adaptability of proteins in terms of evolution and function. Traditional detection methods include sequence-based and structure-based approaches, but these methods have certain limitations, such as insufficient accuracy and low computational efficiency. 2. **Core Issues**: Traditional methods face two major challenges when detecting circular permutations in proteins: - Accuracy Issues: Sequence-based methods may sacrifice some accuracy, especially for proteins with distant evolutionary relationships. - Computational Efficiency Issues: Structure-based methods, while accurate, are computationally intensive and require known 3D structural information. 3. **Proposed New Method**: The paper proposes a new method—plmCP (circular permutation detection based on protein language models). This method utilizes protein language models (such as ESM-1b) to generate embedding vectors for each amino acid. By constructing density matrices and scoring matrices, and combining the Smith-Waterman algorithm for optimal local alignment, it identifies circular permutations in proteins. 4. **Validation and Comparison**: To validate the effectiveness of the new method, the authors selected several pairs of proteins with distant evolutionary relationships (such as 3CNA and 2PEL) and compared them with various existing methods (such as TM-align, CE-CP, SeqCP, etc.). The results show that plmCP performs excellently in detecting circular permutations, especially when dealing with protein pairs that have insertional structural variations, outperforming other methods. In summary, this paper aims to improve the accuracy and computational efficiency of circular permutation detection in proteins by developing a new method based on protein language models, thereby better understanding the evolutionary and functional characteristics of proteins.