Abstract:Traditionally, protein structure comparison has focused on global similarity between two structures. Recent research has focused on finding local structural features in common among a group of proteins. Such shared features are called spatial motifs and correspond to amino-acid packing patterns that may be implicated in function shared among the proteins in the group [4, 5, 6]. Searching for spatial motifs shared among multiple proteins yields fewer spurious results and improved statistical significance of features than those found using pairwise analysis. However, the basic techniques involved are considerably more complex. We propose to demonstrate the current state of our efforts on this problem. Our most recent implementations locate shared spatial motifs among a group of several dozen protein structures in tens of seconds. The motifs are sequence order independent and may occur in every member of a group of proteins or a significant fraction of them (as specified by a threshold parameter). The spatial motif matching process accommodates variation inherent in structure determination. With our current software we expect to be able to provide real time responses to queries submitted by users in the ISMB demo session. Our command line and simple Graphics User Interface (GUI) shown in 1 is being extended. We intend to present a web-based interface with a fully integrated GUI to our server that implements our algorithms and provides access to PDB structures and previously determined spatial motifs. The kernel of our software package is a subgraph mining algorithm that detects all frequent subgraphs from a graph database with a user specified minimal frequency. Our algorithm uses the pattern growth paradigm [3] with an efficient depth first enumeration scheme, searching through the graph space for frequent subgraphs. The recent algorithm incorporates several improvements by taking into account the properties of protein 3D structural graphs, searching only for maximal subgraphs, and incorporating constraints about interesting motifs [1, 2]. Using the tool, we are able to locate common functionally-correlated motifs from proteins with different global structures, such as a NAD binding motif from proteins with different folds [1], which are hard to be identified using sequence or global structure comparison. The algorithm FFSM [3] is written in C++ and compiled and tested in the Linux environment. This software is freely downloadable from http://www. cs. unc. edu/,huan/FFSM. html. We will demonstrate an improved version, CliqueHashing, which will soon be released at the same web site, with the improvements we discussed. Input: PDB IDs (enter one or more, Separated by spaces)

Learning structural motif representations for efficient protein structure search

Learning Structural Motif Representations for Efficient Protein Structure Search

FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

DeepSF: deep convolutional neural network for mapping protein sequences to folds

PiFold: Toward effective and efficient protein inverse folding

Learning Protein Embedding to Improve Protein Fold Recognition Using Deep Metric Learning

Distance-based protein folding powered by deep learning

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation

Improving protein fold recognition using triplet network and ensemble deep learning

CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

Structure-based, deep-learning models for protein-ligand binding affinity prediction

Computational Protein Design with Deep Learning Neural Networks

DeepFold: Enhancing Protein Structure Prediction through Optimized Loss Functions, Improved Template Features, and Re-optimized Energy Function

Structure-based protein design with deep learning

Deep Learning in Protein Structural Modeling and Design

Learning protein sequence embeddings using information from structure

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?

One-sided design of protein-protein interaction motifs using deep learning

Rapid Determination of Local Structural Features Common to a Set of Proteins

Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space

Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks