Abstract:Traditionally, protein structure comparison has focused on global similarity between two structures. Recent research has focused on finding local structural features in common among a group of proteins. Such shared features are called spatial motifs and correspond to amino-acid packing patterns that may be implicated in function shared among the proteins in the group [4, 5, 6]. Searching for spatial motifs shared among multiple proteins yields fewer spurious results and improved statistical significance of features than those found using pairwise analysis. However, the basic techniques involved are considerably more complex. We propose to demonstrate the current state of our efforts on this problem. Our most recent implementations locate shared spatial motifs among a group of several dozen protein structures in tens of seconds. The motifs are sequence order independent and may occur in every member of a group of proteins or a significant fraction of them (as specified by a threshold parameter). The spatial motif matching process accommodates variation inherent in structure determination. With our current software we expect to be able to provide real time responses to queries submitted by users in the ISMB demo session. Our command line and simple Graphics User Interface (GUI) shown in 1 is being extended. We intend to present a web-based interface with a fully integrated GUI to our server that implements our algorithms and provides access to PDB structures and previously determined spatial motifs. The kernel of our software package is a subgraph mining algorithm that detects all frequent subgraphs from a graph database with a user specified minimal frequency. Our algorithm uses the pattern growth paradigm [3] with an efficient depth first enumeration scheme, searching through the graph space for frequent subgraphs. The recent algorithm incorporates several improvements by taking into account the properties of protein 3D structural graphs, searching only for maximal subgraphs, and incorporating constraints about interesting motifs [1, 2]. Using the tool, we are able to locate common functionally-correlated motifs from proteins with different global structures, such as a NAD binding motif from proteins with different folds [1], which are hard to be identified using sequence or global structure comparison. The algorithm FFSM [3] is written in C++ and compiled and tested in the Linux environment. This software is freely downloadable from http://www. cs. unc. edu/,huan/FFSM. html. We will demonstrate an improved version, CliqueHashing, which will soon be released at the same web site, with the improvements we discussed. Input: PDB IDs (enter one or more, Separated by spaces)

Fast protein structure searching using structure graph embeddings

FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

Transcriptional enhancers in the HLA-DQ subregion

Fast protein structure comparison through effective representation learning with contrastive graph neural networks

Neural Embeddings for Protein Graphs

Learning Structural Motif Representations for Efficient Protein Structure Search

ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space

Speedier protein structure search

RUPEE: A fast and accurate purely geometric protein structure search

Sensitive remote homology search by local alignment of small positional embeddings from protein language models

Learning protein sequence embeddings using information from structure

CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction

Rapid Determination of Local Structural Features Common to a Set of Proteins

Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors

Characterization of beta-tricalcium phosphate as a novel immunomodulator.

Transfer Learning for Protein Structure Classification at Low Resolution

ProTrek: Navigating the Protein Universe through Tri-Modal Contrastive Learning

Structure-based protein function prediction using graph convolutional networks