Prediction of protein domains from sequence information using support vector machines

Shu-Xue Zou,Yanxin Huang,Yan Wang,Chunguang Zhou
DOI: https://doi.org/10.1007/11760191_99
2006-01-01
Abstract:Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. In this paper a promising method for detecting the domain structure of a protein from sequence information alone was presented. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using support vector machines. The overall accuracy of the method for a single protein chains dataset, is about 85%. The result demonstrates that the utility of the method can help not only in predicting the complete 3D structure of a protein but also in the study of proteins’ building blocks and for functional analysis.
What problem does this paper attempt to address?