Investigating the complexity of the double distance problems
Marília D. V. Braga,Leonie R. Brockmann,Katharina Klerx,Jens Stoye
DOI: https://doi.org/10.1186/s13015-023-00246-y
2024-01-04
Algorithms for Molecular Biology
Abstract:Abstract Background Two genomes $$\mathbb {A}$$ A and $$\mathbb {B}$$ B over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by $$n_*$$ n ∗ the number of common families of $$\mathbb {A}$$ A and $$\mathbb {B}$$ B . Different distances of canonical genomes can be derived from a structure called breakpoint graph , which represents the relation between the two given genomes as a collection of cycles of even length and paths. Let $$c_i$$ c i and $$p_j$$ p j be respectively the numbers of cycles of length i and of paths of length j in the breakpoint graph of genomes $$\mathbb {A}$$ A and $$\mathbb {B}$$ B . Then, the breakpoint distance of $$\mathbb {A}$$ A and $$\mathbb {B}$$ B is equal to $$n_*-\left( c_2+\frac{p_0}{2}\right)$$ n ∗ - c 2 + p 0 2 . Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance of $$\mathbb {A}$$ A and $$\mathbb {B}$$ B is $$n_*-\left( c+\frac{p_e }{2}\right)$$ n ∗ - c + p e 2 , where c is the total number of cycles and $$p_e$$ p e is the total number of paths of even length. Motivation The distance formulation is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance . Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider a $$\sigma _k$$ σ k distance, defined to be $$n_*-\left( c_2+c_4+\ldots +c_k+\frac{p_0+p_2+\ldots +p_{k-2}}{2}\right)$$ n ∗ - c 2 + c 4 + ... + c k + p 0 + p 2 + ... + p k - 2 2 , and increasingly investigate the complexities of median and double distance for the $$\sigma _4$$ σ 4 distance, then the $$\sigma _6$$ σ 6 distance, and so on. Results While for the median much effort was done in our and in other research groups but no progress was obtained even for the $$\sigma _4$$ σ 4 distance, for solving the double distance under $$\sigma _4$$ σ 4 and $$\sigma _6$$ σ 6 distances we could devise linear time algorithms, which we present here.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology