Geno-Weaving: Low-Complexity Capacity-Achieving DNA Storage

Hsin-Po Wang,Venkatesan Guruswami
2024-09-02
Abstract:As a possible implementation of data storage using DNA, multiple strands of DNA are stored in a liquid container so that, in the future, they can be read by an array of DNA readers in parallel. These readers will sample the strands with replacement to produce a random number of noisy reads for each strand. An essential component of such a data storage system is how to reconstruct data out of these unsorted, repetitive, and noisy reads. It is known that if a single read can be modeled by a substitution channel $W$, then the overall capacity can be expressed by the "Poisson-ization" of $W$. In this paper, we lay down a rateless code along each strand to encode its index; we then lay down a capacity-achieving block code at the same position across all strands to protect data. That weaves a low-complexity coding scheme that achieves DNA's capacity.
Information Theory
What problem does this paper attempt to address?