Validity of Peptide Composition and GC-content for Classifying Bacteria

LI JingKe,JIN Tao,ZHAO Hong
DOI: https://doi.org/10.1360/sspma2015-00054
2015-01-01
Abstract:In the past decades,a lot of methods have been proposed to construct Genome Tree.Among them,K-String Composition Approach which is Alignment-Free shows nonnegligible superiority.On the other hand,the species specificity of GC(Guanine+Cytosine)-content which actually is the lowest-order version of K-String Composition has been discovered for a long time,especially in bacteria.Unfortunately,its resolution is too poor to be applied to reconstruct phylogeny.Motivated by those facts,in this paper,relationship between composition vector of peptides and GC-content of corresponding DNA sequence is studied for bacteria.A strong correlation is uncovered for short peptides,and with the increase of peptide length the correlation exhibits an abrupt change,that is,tends to vanish quickly.These results indicate that the composition vector of longer peptide do contains more precise information of species specificity than that of GC-content,and therefore can effectively measure the genetic relationship of bacteria.Short peptides are obviously not competent.
What problem does this paper attempt to address?