Quantum Algorithms for the Shortest Common Superstring and Text Assembling Problems

Kamil Khadiev,Carlos Manuel Bosch Machado,Zeyu Chen,Junde Wu
DOI: https://doi.org/10.26421/QIC24.3-4-4
2024-01-01
Abstract:In this paper, we consider two versions of the Text Assembling problem. We are given a sequence of strings $s^1,\dots,s^n$ of total length $L$ that is a dictionary, and a string $t$ of length $m$ that is texts. The first version of the problem is assembling $t$ from the dictionary. The second version is the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). In this case, $t$ is not given, and we should construct the shortest string (we call it superstring) that contains each string from the given sequence as a substring. These problems are connected with the sequence assembly method for reconstructing a long DNA sequence from small fragments. For both problems, we suggest new quantum algorithms that work better than their classical counterparts. In the first case, we present a quantum algorithm with $O(m+\log m\sqrt{nL})$ running time. In the case of SSP, we present a quantum algorithm with running time $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$.
Quantum Physics,Data Structures and Algorithms
What problem does this paper attempt to address?
This paper mainly addresses two problems related to string assembly: the Shortest Common Superstring Problem (SCS) and the Text Assembling Problem (TAO). Specifically: 1. **Shortest Common Superstring Problem (SCS)**: - Problem Definition: Given a series of strings \(S = (s_1,\ldots,s_n)\), construct a string \(t\) (called a superstring) as short as possible such that each string in \(S\) is a substring of \(t\). - Application Background: This problem is related to sequence assembly methods in bioinformatics, i.e., reconstructing longer DNA sequences from smaller fragments. - Solution: A quantum algorithm is proposed with a runtime of \(O(n^{31.728}n+L+n^{1.5}\sqrt{L}+n\sqrt{L}\log^2{L}\log^2{n})\). 2. **Text Assembling Problem (TAO)**: - Problem Definition: Given a series of strings \(S = (s_1,\ldots,s_n)\) and a target string \(t\), use the strings in \(S\) to construct \(t\), allowing overlaps. - Application Background: Related to sequence assembly methods in bioinformatics, particularly reference-guided genome assembly methods. - Solution: A quantum algorithm is proposed with a runtime of \(O(m+\log{m}\sqrt{nL})\), where \(m\) is the length of the target string \(t\), and \(L\) is the total length of the dictionary strings. The paper details the background of the two problems, existing solutions, and the newly proposed quantum algorithms. By leveraging the advantages of quantum computing, these algorithms are more efficient than classical algorithms in certain cases. Additionally, the paper discusses the connection between these problems and real-world applications, particularly in the field of bioinformatics.