Abstract:Traditionally, protein structure comparison has focused on global similarity between two structures. Recent research has focused on finding local structural features in common among a group of proteins. Such shared features are called spatial motifs and correspond to amino-acid packing patterns that may be implicated in function shared among the proteins in the group [4, 5, 6]. Searching for spatial motifs shared among multiple proteins yields fewer spurious results and improved statistical significance of features than those found using pairwise analysis. However, the basic techniques involved are considerably more complex. We propose to demonstrate the current state of our efforts on this problem. Our most recent implementations locate shared spatial motifs among a group of several dozen protein structures in tens of seconds. The motifs are sequence order independent and may occur in every member of a group of proteins or a significant fraction of them (as specified by a threshold parameter). The spatial motif matching process accommodates variation inherent in structure determination. With our current software we expect to be able to provide real time responses to queries submitted by users in the ISMB demo session. Our command line and simple Graphics User Interface (GUI) shown in 1 is being extended. We intend to present a web-based interface with a fully integrated GUI to our server that implements our algorithms and provides access to PDB structures and previously determined spatial motifs. The kernel of our software package is a subgraph mining algorithm that detects all frequent subgraphs from a graph database with a user specified minimal frequency. Our algorithm uses the pattern growth paradigm [3] with an efficient depth first enumeration scheme, searching through the graph space for frequent subgraphs. The recent algorithm incorporates several improvements by taking into account the properties of protein 3D structural graphs, searching only for maximal subgraphs, and incorporating constraints about interesting motifs [1, 2]. Using the tool, we are able to locate common functionally-correlated motifs from proteins with different global structures, such as a NAD binding motif from proteins with different folds [1], which are hard to be identified using sequence or global structure comparison. The algorithm FFSM [3] is written in C++ and compiled and tested in the Linux environment. This software is freely downloadable from http://www. cs. unc. edu/,huan/FFSM. html. We will demonstrate an improved version, CliqueHashing, which will soon be released at the same web site, with the improvements we discussed. Input: PDB IDs (enter one or more, Separated by spaces)

Probabilistic Analysis of the Frequencies of Amino Acid Pairs Within Characterized Protein Sequences

Analysis of the Amino Acid Effect on Protein Folding by Atom Pair Contacts

Finding Haplotypic Signatures in Proteins

On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Using the Radial Distributions of Physical Features to Compare Amino Acid Environments and Align Amino Acid Sequences.

Can Simple Codon Pair Usage Predict Protein-Protein Interaction?

Unveiling Conserved Allosteric Hot Spots in Protein Domains from Sequences

Phylogenetic Profiles as a Unified Framework for Measuring Protein Structure, Function and Evolution

Mining Protein Sequence Motifs Representing Common 3D Structures.

Protein Map: An Alignment-Free Sequence Comparison Method Based On Various Properties Of Amino Acids

Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition

Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies

A Combinatorial Perspective of the Protein Inference Problem

Rapid Determination of Local Structural Features Common to a Set of Proteins

Network analysis of synonymous codon usage

The Protein Family Classification in Protein Databases via Entropy Measures

Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes

Protein Space: A Natural Method for Realizing the Nature of Protein Universe

A Protein Map and Its Application

A protein sequence fitness function for identifying natural and nonnatural proteins