Shift-Interleave Coding for DNA-Based Storage: Correction of IDS Errors and Sequence Losses

Ryo Shibata,Haruhiko Kaneko
2024-01-26
Abstract:We propose a novel coding scheme for DNA-based storage systems, called the shift-interleave (SI) coding, designed to correct insertion, deletion, and substitution (IDS) errors, as well as sequence losses. The SI coding scheme employs multiple codewords from two binary low-density parity-check codes. These codewords are processed to form DNA base sequences through shifting, bit-to-base mapping, and interleaving. At the receiver side, an efficient non-iterative detection and decoding scheme is employed to sequentially estimate codewords. The numerical results demonstrate the excellent performance of the SI coding scheme in correcting both IDS errors and sequence losses.
Information Theory
What problem does this paper attempt to address?
This paper aims to address insertion, deletion, and substitution (IDS) errors and sequence loss issues in DNA storage systems. Specifically, the authors propose a new coding scheme called Shifted Interleaving (SI) coding. This scheme utilizes multiple codewords of two binary Low-Density Parity-Check (LDPC) codes and forms DNA base sequences through shifting, bit-to-base mapping, and interleaving operations. At the receiving end, an efficient non-iterative detection and decoding scheme is used to estimate these codewords one by one. Numerical results show that the SI coding scheme performs excellently in correcting IDS errors and sequence loss. Additionally, the scheme is not limited by the length of a single code, can utilize long LDPC codes to improve performance, reduce computational complexity, and has significant advantages in handling sequence loss and base-level errors.