A New Method Based on Coding Sequence Density to Cluster Bacteria.

Nan Sun,Rui Dong,Shaojun Pei,Changchuan Yin,Stephen S. -T. Yau
DOI: https://doi.org/10.1089/cmb.2019.0509
IF: 1.549
2020-01-01
Journal of Computational Biology
Abstract:Bacterial evolution is an important study field, biological sequences are often used to construct phylogenetic relationships. Multiple sequence alignment is very time-consuming and cannot deal with large scales of bacterial genome sequences in a reasonable time. Hence, a new mathematical method, joining density vector method, is proposed to cluster bacteria, which characterizes the features of coding sequence (CDS) in a DNA sequence. Coding sequences carry genetic information that can synthesize proteins. The correspondence between a genomic sequence and its joining density vector (JDV) is one-to-one. JDV reflects the statistical characteristics of genomic sequence and large amounts of data can be analyzed using this new approach. We apply the novel method to do phylogenetic analysis on four bacterial data sets at hierarchies of genus and species. The phylogenetic trees prove that our new method accurately describes the evolutionary relationships of bacterial coding sequences, and is faster than ClustalW and the existing alignment-free methods.
What problem does this paper attempt to address?