Combinatorics From Bacterial Genomes

Bailin Hao
DOI: https://doi.org/10.1007/978-3-540-73556-4_2
2007-01-01
Abstract:By visualizing bacterial genome data we have encountered a few neat mathematical problems. The first problem concerns the number of longer missing strings (of length K + i, i ≥ 1) taken away by the absence of one or more K-strings. The exact solution of the problem may be obtained by using the Golden-Jackson cluster method in combinatorics and by making use of a special kind of formal languages, namely, the factorizable language. The second problem consists in explaining the fine structure observed in one-dimensional K-string histograms of some randomized genomes. The third problem is the uniqueness of reconstructing a protein sequence from its constituent K-peptides. The latter problem has a natural connection with the number of Eulerian loops in a graph. To tell whether a protein sequence has a unique reconstruction at a given K the factorizable language again comes to our help.
What problem does this paper attempt to address?