Rapid Determination of Local Structural Features Common to a Set of Proteins
Jun Huan,Deepak Bandyopadhyay,Jinze Liu,Jan Prins,Jack Snoeyink,Alexander Tropsha,Wei Wang
2008-01-01
Abstract:Traditionally, protein structure comparison has focused on global similarity between two structures. Recent research has focused on finding local structural features in common among a group of proteins. Such shared features are called spatial motifs and correspond to amino-acid packing patterns that may be implicated in function shared among the proteins in the group [4, 5, 6]. Searching for spatial motifs shared among multiple proteins yields fewer spurious results and improved statistical significance of features than those found using pairwise analysis. However, the basic techniques involved are considerably more complex. We propose to demonstrate the current state of our efforts on this problem. Our most recent implementations locate shared spatial motifs among a group of several dozen protein structures in tens of seconds. The motifs are sequence order independent and may occur in every member of a group of proteins or a significant fraction of them (as specified by a threshold parameter). The spatial motif matching process accommodates variation inherent in structure determination. With our current software we expect to be able to provide real time responses to queries submitted by users in the ISMB demo session. Our command line and simple Graphics User Interface (GUI) shown in 1 is being extended. We intend to present a web-based interface with a fully integrated GUI to our server that implements our algorithms and provides access to PDB structures and previously determined spatial motifs. The kernel of our software package is a subgraph mining algorithm that detects all frequent subgraphs from a graph database with a user specified minimal frequency. Our algorithm uses the pattern growth paradigm [3] with an efficient depth first enumeration scheme, searching through the graph space for frequent subgraphs. The recent algorithm incorporates several improvements by taking into account the properties of protein 3D structural graphs, searching only for maximal subgraphs, and incorporating constraints about interesting motifs [1, 2]. Using the tool, we are able to locate common functionally-correlated motifs from proteins with different global structures, such as a NAD binding motif from proteins with different folds [1], which are hard to be identified using sequence or global structure comparison. The algorithm FFSM [3] is written in C++ and compiled and tested in the Linux environment. This software is freely downloadable from http://www. cs. unc. edu/,huan/FFSM. html. We will demonstrate an improved version, CliqueHashing, which will soon be released at the same web site, with the improvements we discussed. Input: PDB IDs (enter one or more, Separated by spaces)