Abstract:Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast guide tree to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including JukesCantor, Kimura, F84 and TamuraNei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.

A New Alignment-Free Sequence Analysis Based on the Distribution of K-tuple

Local Alignment-Free Sequences Based on D2shepp Statistics

A New Distance Metric and Its Application in Phylogenetic Tree Construction

Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads: Extended Abstract.

Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.

Alignment-free sequence comparison based on next-generation sequencing reads.

A New Distance Computing Method for DNA Sequences in Phylogenetic Analysis

DNA sequence comparison by a novel probabilistic method

Position-specificstatistical Model of DNA Sequences and Its Application for Similarity Analysis.

A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory

A New Method Based on Coding Sequence Density to Cluster Bacteria.

A Novel K-Word Relative Measure for Sequence Comparison.

PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction.

Prokaryote Phylogeny Without Sequence Alignment: from Avoidance Signature to Composition Distance.

A Novel Measurement of Sequence Dissimilarity and Its Application to Phylogeny

The Power Study About Three Statisticsof Alignment-Free Comparison Based On At-Rich Model

Ksak: A high-throughput tool for alignment-free phylogenetics

Similarity analysis of DNA sequences through local distribution of nucleotides in strategic neighborhood

Statistical Phylogenetic Tree Analysis Using Differences of Means

An Information-Based Sequence Distance and Its Application to Whole Mitochondrial Genome Phylogeny

A complete characterization of pairs of binary phylogenetic trees with identical $A_k$-alignments