Protein Sequence Classification Using Natural Vector and Convex Hull Method.

Yi Wang,Kun Tian,Stephen S. -T. Yau
DOI: https://doi.org/10.1089/cmb.2018.0216
IF: 1.549
2019-01-01
Journal of Computational Biology
Abstract:Protein kinase C (PKC) is a superfamily of enzymes, which regulate numerous cellular responses. The specific function of PKC protein family is mainly governed by its individual protein domains. However, existing protein sequence classification methods based on sequence alignment and sequence analysis models focused little on the domain analysis. In this study, we introduce a novel protein kinase classification method that considers both domain sequence similarity and whole sequence similarity to quantify the evolutionary distance from a specific protein to a protein family. Using the natural vector method, we establish a 60-dimensional space, where each protein is uniquely represented by a vector. We also define a convex hull, consisting of the natural vectors corresponding to all members of a protein family. The sequence similarity between a protein and a protein family, therefore, can be quantified as the distance between the protein vector and the protein family convex hull. We have applied this method in a PKC sample library and the results showed a higher accuracy of classification compared with other alignment-free methods.
What problem does this paper attempt to address?