Reconstruction of multiple strings of constant weight from prefix-suffix compositions

Yaoyu Yang,Zitan Chen
2024-11-06
Abstract:Motivated by studies of data retrieval in polymer-based storage systems, we consider the problem of reconstructing a multiset of binary strings that have the same length and the same weight from the compositions of their prefixes and suffixes of every possible length. We provide necessary and sufficient conditions for which unique reconstruction up to reversal of the strings is possible. Additionally, we present two algorithms for reconstructing strings from the compositions of prefixes and suffixes of constant-length constant-weight strings.
Discrete Mathematics,Information Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to reconstruct a set of binary strings with the same length \(n\) and the same weight \(\bar{w}\) given a combination of prefixes and suffixes. Specifically, the author focuses on the possibility of uniquely reconstructing these strings (up to reversal) from all possible length prefix - suffix combinations and provides necessary and sufficient conditions to ensure this uniqueness. ### Problem Background With the growth in the demand for archived data storage, researchers are exploring innovative solutions to go beyond traditional tape or hard - drive storage methods. In particular, polymers such as DNA are considered promising media for future archived data storage because of their high storage density and durability. However, common sequencing techniques can usually only read random fragments of the polymer, so the data retrieval task must be based on the information provided by these fragments. ### Research Motivation In polymer - based storage systems, data retrieval depends on macromolecule sequencing techniques to read the information stored in the polymer. However, common sequencing techniques can usually only read random fragments of the polymer, so it is necessary to reconstruct the original data based on the information of these fragments. For this reason, the author considered the problem of reconstructing strings from all possible length prefix - suffix combinations of binary strings. ### Main Contributions 1. **Necessary and Sufficient Conditions**: The author provides the necessary and sufficient conditions for strings that can be uniquely reconstructed (up to reversal) from their prefix - suffix combinations. 2. **Algorithm Design**: Two algorithms are proposed for reconstructing strings from the prefix - suffix combinations of constant - length, constant - weight strings. One algorithm can efficiently output the same multiset of strings as the input prefix - suffix combination, and the other algorithm can output all multiset of strings (up to reversal) that match the input. ### Formula Representation To ensure the correctness and readability of the formulas, some of the formulas involved in the paper are presented in Markdown format as follows: - The weight of the string \(t\) is defined as: \[ \text{wt}(t)=\sum_{i = 1}^n t_i \] - The composition of the string \(t\) consists of the number of zeros and ones and is represented as an ordered pair: \[ (n-\text{wt}(t),\text{wt}(t)) \] - The set of prefixes and suffixes is defined as: \[ M_p(t)=\left\{(j-\text{wt}(t[j]),\text{wt}(t[j]))\mid1\leq j\leq n\right\} \] \[ M_s(t)=\left\{(j-\text{wt}(t[-j]),\text{wt}(t[-j]))\mid1\leq j\leq n\right\} \] - The Cumulative Weight Function (CWF) is defined as a function \(f:T\rightarrow[n]\) that satisfies the following conditions: \[ f(0,m) = 0\quad\forall m\in[2h] \] \[ f(l,m)-f(l - 1,m)\in\{0,1\}\quad\forall(l,m)\in[n]\times[2h] \] \[ f(l,2j - 1)+f(n - l,2j)=w_j\quad\forall l\in[n],j\in[h] \] Through these formulas and conditions, the author can accurately describe and solve the problem of reconstructing strings from the prefix - suffix combinations of constant - length, constant - weight strings.