Abstract:Improving the ability to predict protein function can potentially facilitate research in the fields of drug discovery and precision medicine. Technically, the properties of proteins are directly or indirectly reflected in their sequence and structure information, especially as the protein function is largely determined by its spatial properties. Existing approaches mostly focus on protein sequences or topological structures, while rarely exploiting the spatial properties and ignoring the relevance between sequence and structure information. Moreover, obtaining annotated data to improve protein function prediction is often time-consuming and costly. To this end, this work proposes a novel contrast-aware pre-training framework, called SCOP, for protein function prediction. We first design a simple yet effective encoder to integrate the protein topological and spatial features under the structure view. Then a convolutional neural network is utilized to learn the protein features under the sequence view. Finally, we pretrain SCOP by leveraging two types of auxiliary supervision to explore the relevance between these two views and thus extract informative representations to better predict protein function. Experimental results on four benchmark datasets and one self-built dataset demonstrate that SCOP provides more specific results, while using less pre-training data.

What problem does this paper attempt to address?

This paper aims to solve several key problems in protein function prediction: 1. **Scarcity of protein labels**: One of the main challenges faced by existing protein function prediction methods is the lack of labeled data. Data on the physicochemical properties and biological functions of proteins are usually obtained through time - consuming and costly wet - laboratory experiments, so such data are very scarce. 2. **Insufficient learning of structural features**: The function of a protein is largely determined by its spatial structure. However, existing sequence - based methods often ignore the spatial structure information of proteins, and most structure - based methods only consider the two - dimensional topological structure of proteins and ignore the spatial features of specific conformations in three - dimensional space, resulting in incomplete learned representations. 3. **Under - utilization of the correlation between sequence and structure**: Protein sequence descriptors and structure descriptors describe proteins at different levels respectively. However, existing methods either learn protein representations from only one perspective or simply perform feature extraction on sequences and structures, failing to fully utilize the correlation and association between sequences and structures, making the learned representations may not be comprehensive enough. To solve the above problems, the paper proposes a new contrast - aware pre - training framework named SCOP (Sequence - Structure Contrast - Aware Pre - training) for protein function prediction. The main features of SCOP include: - **Introducing a protein structure encoder** to integrate the topological and spatial features of proteins. - **Fully utilizing the supervision information in protein sequence - structure pairings** to explore the correlation between these two views. - **Proposing a contrast - aware pre - training framework** that can learn protein representations without label information. Experimental results on four benchmark datasets and one self - built dataset show that SCOP can provide more specific results and use less pre - training data.

SCOP: A Sequence-Structure Contrast-Aware Framework for Protein Function Prediction

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

Multi-level Protein Structure Pre-training via Prompt Learning

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function

PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation

ProtFAD: Introducing function-aware domains as implicit modality towards protein function prediction

Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs

PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding

CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

A comprehensive review and comparison of existing computational methods for protein function prediction

A protein fitness predictive framework based on feature combination and intelligent searching

Modifying specific cysteines of the electrophile-sensing human Keap1 protein is insufficient to disrupt binding to the Nrf2 domain Neh2.

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

TPpred-SC: multi-functional therapeutic peptide prediction based on multi-label supervised contrastive learning

SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition

Explainable protein function annotation using local structure embeddings

ProtEx: A Retrieval-Augmented Approach for Protein Function Prediction

CCPL: Cross-modal Contrastive Protein Learning

Clinical and bacteriological evaluation of furaltadone (Altafur).

Parallel convolutional contrastive learning method for enzyme function prediction