Abstract:Traditionally, protein structure comparison has focused on global similarity between two structures. Recent research has focused on finding local structural features in common among a group of proteins. Such shared features are called spatial motifs and correspond to amino-acid packing patterns that may be implicated in function shared among the proteins in the group [4, 5, 6]. Searching for spatial motifs shared among multiple proteins yields fewer spurious results and improved statistical significance of features than those found using pairwise analysis. However, the basic techniques involved are considerably more complex. We propose to demonstrate the current state of our efforts on this problem. Our most recent implementations locate shared spatial motifs among a group of several dozen protein structures in tens of seconds. The motifs are sequence order independent and may occur in every member of a group of proteins or a significant fraction of them (as specified by a threshold parameter). The spatial motif matching process accommodates variation inherent in structure determination. With our current software we expect to be able to provide real time responses to queries submitted by users in the ISMB demo session. Our command line and simple Graphics User Interface (GUI) shown in 1 is being extended. We intend to present a web-based interface with a fully integrated GUI to our server that implements our algorithms and provides access to PDB structures and previously determined spatial motifs. The kernel of our software package is a subgraph mining algorithm that detects all frequent subgraphs from a graph database with a user specified minimal frequency. Our algorithm uses the pattern growth paradigm [3] with an efficient depth first enumeration scheme, searching through the graph space for frequent subgraphs. The recent algorithm incorporates several improvements by taking into account the properties of protein 3D structural graphs, searching only for maximal subgraphs, and incorporating constraints about interesting motifs [1, 2]. Using the tool, we are able to locate common functionally-correlated motifs from proteins with different global structures, such as a NAD binding motif from proteins with different folds [1], which are hard to be identified using sequence or global structure comparison. The algorithm FFSM [3] is written in C++ and compiled and tested in the Linux environment. This software is freely downloadable from http://www. cs. unc. edu/,huan/FFSM. html. We will demonstrate an improved version, CliqueHashing, which will soon be released at the same web site, with the improvements we discussed. Input: PDB IDs (enter one or more, Separated by spaces)

Automatic sorting of point pattern sets using Minkowski Functionals

Spatial Pattern Analysis using Closest Events (SPACE)—A Nearest Neighbor Point Pattern Analysis Framework for Assessing Spatial Relationships from Digital Images

PCA for Point Processes

An Automatic Registration Approach to Laser Point Sets Based on Multidiscriminant Parameter Extraction

An Information-Geometric Formulation of Pattern Separation and Evaluation of Existing Indices

The cylindrical K-function and Poisson line cluster point processes

Point pattern analysis and classification on compact two-point homogeneous spaces evolving time

Generalized Statistical Tests for mRNA and Protein Subcellular Spatial Patterning against Complete Spatial Randomness

Minkowski functionals for composite smooth random fields

Rapid Determination of Local Structural Features Common to a Set of Proteins

Variation Pattern Classification of Functional Data

Morphological analysis of 3d atom probe data using Minkowski functionals

A Geometric Reasoning Based Algorithm for Point Pattern Matching

Multilevel Functional Principal Component Analysis for High-Dimensional Data

Principal nested shape space analysis of molecular dynamics data

Robust Morphological Measures for Large-Scale Structure in the Universe

Sparse Functional Principal Component Analysis in High Dimensions

Morphometric analysis in gamma-ray astronomy using Minkowski functionals: II. Joint structure quantification

Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data

Sort & Slice: A Simple and Superior Alternative to Hash-Based Folding for Extended-Connectivity Fingerprints