The Quasi-probability Method and Applications for Trace Reconstruction

Ittai Rubinstein
2024-10-16
Abstract:In the trace reconstruction problem, one attempts to reconstruct a fixed but unknown string $x$ of length $n$ from a given number of traces $\tilde{x}$ drawn iid from the application of a noisy process (such as the deletion channel) to $x$. The best known algorithm for the trace reconstruction from the deletion channel is due to Chase, and recovers the input string whp given $\exp(\tilde{O}(n^{1/5}))$ traces [Cha21b]. The main component in Chase's algorithm is a procedure for k-mer estimation, which, for any marker $w$ in $\{0, 1\}^k$ of length $k$, computes a "smoothed" distribution of its appearances in the input string $x$ [CGL+23, MS24]. Current k-mer estimation algorithms fail when the deletion probability is above $1/2$, requiring a more complex analysis for Chase's algorithm. Moreover, the only known extension of these approaches beyond the deletion channels is based on numerically estimating high-order differentials of a multivariate polynomial, making it highly impractical [Rub23]. In this paper, we construct a simple Monte Carlo method for k-mer estimation which can be easily applied to a much wider variety of channels. In particular, we solve k-mer estimation for any combination of insertion, deletion, and bit-flip channels, even in the high deletion probability regime, allowing us to directly apply Chase's algorithm for this wider class of channels. To accomplish this, we utilize an approach from the field of quantum error mitigation (the process of using many measurements from noisy quantum computers to simulate a clean quantum computer), called the quasi-probability method (also known as probabilistic error cancellation) [TBG17, PSW22]. We derive a completely classical version of this technique, and use it to construct a k-mer estimation algorithm. No background in quantum computing is needed to understand this paper.
Data Structures and Algorithms,Information Theory
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve a key challenge in the **trace reconstruction problem**, namely, performing k - mer estimation under high deletion probability and other complex noise channels (such as insertion and bit - flipping channels). Specifically: 1. **Overview of the trace reconstruction problem**: - In the trace reconstruction problem, given an unknown binary string \(x\) and multiple traces \(\tilde{x}\) generated from this string through a certain noise channel (e.g., the deletion channel), the goal is to reconstruct the original string \(x\) from these traces. - The best - known algorithm currently is proposed by Chase, which can recover the input string with \(\exp(\tilde{O}(n^{1/5}))\) traces in the worst - case [Cha21b]. 2. **Limitations of existing methods**: - When the deletion probability exceeds 1/2, the existing k - mer estimation methods fail, which makes Chase's algorithm difficult to be directly applied to more complex noise models. - Rubinstein extended Chase's analysis to the insertion - deletion - symmetric channel, but this method is very complex and difficult to implement in practice [Rub23]. 3. **Main contributions of the paper**: - The authors propose a simple Monte Carlo method based on the **quasi - probability method** for k - mer estimation, which can be applied to a wider range of noise channel combinations, including insertion, deletion, and bit - flipping channels, and can work effectively even in the case of high deletion probability. - This new method not only simplifies the algorithm design but also reduces the time complexity and can be directly applied to Chase's algorithm framework, thereby expanding its scope of application. 4. **Specific objectives**: - Construct a k - mer estimation algorithm suitable for any combination of insertion, deletion, and bit - flipping channels, making it still effective in the case of high deletion probability (1/2 ≤ q < 1). - The proposed algorithm has a good sample complexity of \(\exp(\tilde{O}(n^{1/5}))\) and can perform trace reconstruction efficiently in the worst - case. ### Formula summary - **Probability parameters of the deletion channel**: Let the deletion probability be \(\delta\), then the probability that each bit is deleted is \(\delta\). - **Sample complexity of k - mer estimation**: Defined as \(N=(1 + O(\alpha^2))n\times\text{poly}(\varepsilon^{-1})\times\text{polylog}(1 / \delta)\), where \(\alpha\) is the frequency parameter, \(\varepsilon\) is the error, and \(\delta\) is the failure probability. By introducing the quasi - probability method, the paper successfully overcomes the limitations of existing methods in the case of high deletion probability and other complex noises, providing a more general and efficient solution to the trace reconstruction problem.