3D-PAF Curve: A Novel Graphical Representation of Protein Sequences for Similarity Analysis

Zengchao Mu,Guojun Li,Haiyan Wu,Xingqin Qi
2016-01-01
Abstract:Based on the physicochemical properties of amino acids, in this paper, we first propose a novel graphical representation called 3D-PAF curve of protein sequence, which incorporates the accumulative frequencies of adjacent amino acids of the protein sequence. Then, we derive a 8-dimensional numerical vector to characterize a 3D-PAF curve. Because a protein sequence corresponds to 12 kinds of 3D-PAF curves, we take a 96-dimensional vector as the feature vector of the protein sequence. The similarity between any two protein sequences can be measured by the standardized Euclidean distance between their feature vectors. Finally we apply this new method on two data sets (nine ND5 proeins, and 35 coronavirus spike proteins) to analysis the similarities of protein sequences. The results both demonstrate the validity of our method.
What problem does this paper attempt to address?