Abstract:As a powerful tool for storing digital information in chemically synthesized molecules, DNA-based data storage has undergone continuous development and received increasingly more attention. Efficiently recovering information from large-scale DNA strands that suffer from insertions, deletions, and substitution errors (collectively referred to as edit errors), is one of the major bottlenecks in DNA-based storage systems. To cope with this challenge, in this paper, we provide a segmented-edit error-correcting code with the re-synchronization function, termed the DNA-LM code. Compared with the previous segmented-error-correcting codes, it has a systematic structure and does not require the endpoint of the received segment as pre-requisite information for decoding. In the case that the number of edit errors exceeds the edit error-correcting capability of a segment, it can easily regain synchronization to ensure that the subsequent decoding continues. Both encoding and decoding complexity is linear in the codeword length. The redundancy of each segment is $\lceil \log k\rceil +6$ quaternary symbols, where $k$ is the length of the message segment. We further generalize the decoding algorithm to deal with duplicated DNA strands, whereas it still maintains linear time complexity in the codeword length and the number of duplications. Simulations under a stochastic edit errors model show that, at a low raw error rate of the “next-gen” sequencing, our code can enable error-free decoding by concatenating with the (255,223) RS code.

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

Codes with Biochemical Constraints and Single Error Correction for DNA-Based Data Storage

Multiple Errors Correction for Position -Limited DNA Sequences with GC Balance and No Homopolymer for DNA -Based Data Storage

Error-Correcting Codes for Short Tandem Duplication and Edit Errors

Concatenated Code Design for Constrained DNA Data Storage with Asymmetric Errors

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Error Correction for DNA Storage

Unrestricted Error-Type Codebook Generation for Error Correction Code in DNA Storage Inspired by NLP

Coding over Sets for DNA Storage

Improved Coding over Sets for DNA-Based Data Storage

Optimized Code Design for Constrained DNA Data Storage With Asymmetric Errors.

Sequence-Subset Distance and Coding for Error Control in DNA-based Data Storage

High Information Density and Low Coverage Data Storage in DNA with Efficient Channel Coding Schemes

Error-Correcting Codes for Combinatorial Composite DNA

A Segmented-Edit Error-Correcting Code with Re-Synchronization Function for DNA-Based Storage Systems

Exact Error Exponents of Concatenated Codes for DNA Storage

Embracing Errors Is More Efficient Than Avoiding Them Through Constrained Coding for DNA Data Storage

Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage

An End-to-End Coding Scheme for DNA-Based Data Storage With Nanopore-Sequenced Reads

Codes for Limited-Magnitude Probability Error in DNA Storage

Constrained Channel Capacity for DNA-Based Data Storage Systems.