DISTANCE-BASED IDENTIFICATION OF STRUCTURE MOTIFS IN PROTEINS USING CONSTRAINED FREQUENT SUBGRAPH MINING
Jun Huan,Deepak Bandyopadhyay,Jan F. Prins,Jack Snoeyink,Alexander Tropsha,Wei Wang
DOI: https://doi.org/10.1142/9781860947575_0029
2006-01-01
Abstract:Series on Advances in Bioinformatics and Computational BiologyComputational Systems Bioinformatics, pp. 227-238 (2006) No AccessDISTANCE-BASED IDENTIFICATION OF STRUCTURE MOTIFS IN PROTEINS USING CONSTRAINED FREQUENT SUBGRAPH MININGJun Huan, Deepak Bandyopadhyay, Jan Prins, Jack Snoeyink, Alexander Tropsha and Wei WangJun HuanComputer Science Department, University of North Carolina at Chapel Hill, USA, Deepak BandyopadhyayComputer Science Department, University of North Carolina at Chapel Hill, USA, Jan PrinsComputer Science Department, University of North Carolina at Chapel Hill, USA, Jack SnoeyinkComputer Science Department, University of North Carolina at Chapel Hill, USA, Alexander TropshaThe Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, USA and Wei WangComputer Science Department, University of North Carolina at Chapel Hill, USAhttps://doi.org/10.1142/9781860947575_0029Cited by:10 PreviousNext AboutSectionsPDF/EPUB ToolsAdd to favoritesDownload CitationsTrack CitationsRecommend to Library ShareShare onFacebookTwitterLinked InRedditEmail Abstract: Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the Cα atoms of the two residues and (2) a boolean indicating whether the two residues are in physical/chemical contact. Using this representation, a structure motif corresponds to a labeled clique that occurs frequently among the graphs representing the protein structures. The pairwise distance constraints on each edge in a clique serve to limit the variation in geometry among different occurrences of a structure motif. We present an efficient constrained subgraph mining algorithm to discover structure motifs in this setting. Compared with contact graph representations, the number of spurious structure motifs is greatly reduced. Using this algorithm, structure motifs were located for several SCOP families including the Eukaryotic Serine Proteases, Nuclear Binding Domains, Papain-like Cysteine Proteases, and FAD/NAD-linked Reductases. For each family, we typically obtain a handful of motifs within seconds of processing time. The occurrences of these motifs throughout the PDB were strongly associated with the original SCOP family, as measured using a hyper-geometric distribution. The motifs were found to cover functionally important sites like the catalytic triad for Serine Proteases and co-factor binding sites for Nuclear Binding Domains. The fact that many motifs are highly family-specific can be used to classify new proteins or to provide functional annotation in Structural Genomics Projects. Keywords: protein structure comparisonstructure motifgraph miningclique FiguresReferencesRelatedDetailsCited By 10Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk AssessmentAlexander Golbraikh, Xiang Simon Wang, Hao Zhu and Alexander Tropsha29 January 2017Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk AssessmentAlexander Golbraikh, Xiang Simon Wang, Hao Zhu and Alexander Tropsha15 April 2016An amino acid code for irregular and mixed protein packingHyun Joo, Archana G. Chavan, Keith J. Fraga and Jerry Tsai5 October 2015 | Proteins: Structure, Function, and Bioinformatics, Vol. 83, No. 12Proteins comparison through probabilistic optimal structure local alignmentGiovanni Micale, Alfredo Pulvirenti, Rosalba Giugno and Alfredo Ferro2 September 2014 | Frontiers in Genetics, Vol. 5G-Tries: a data structure for storing and finding subgraphsPedro Ribeiro and Fernando Silva12 February 2013 | Data Mining and Knowledge Discovery, Vol. 28, No. 2Mining Discriminative Subgraph Patterns from Structural DataNing Jin and Wei Wang1 Jan 2014Recurrent Structural Motifs in Non-Homologous Protein StructuresMaria Johansson, Vincent Zoete and Nicolas Guex10 April 2013 | International Journal of Molecular Sciences, Vol. 14, No. 4Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk AssessmentAlexander Golbraikh, Xiang Simon Wang, Hao Zhu and Alexander Tropsha7 November 2014An efficient graph-mining method for complicated and noisy data with real-world applicationsYi Jia, Jintao Zhang and Jun Huan2 February 2011 | Knowledge and Information Systems, Vol. 28, No. 2Bridging protein local structures and protein functionsZhi-Ping Liu, Ling-Yun Wu, Yong Wang, Xiang-Sun Zhang and Luonan Chen18 April 2008 | Amino Acids, Vol. 35, No. 3 Computational Systems BioinformaticsMetrics History Keywordsprotein structure comparisonstructure motifgraph miningcliquePDF download