A New Version of q-ary Varshamov-Tenengolts Codes with More Efficient Encoders: The Differential VT Codes and The Differential Shifted VT Codes

Tuan Thanh Nguyen,Kui Cai,Paul H. Siegel
2023-11-08
Abstract:The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts [1984], and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. [2017]. Recently, Wang et al. [2021] proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy log n+O(log q log log n) bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution in which the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature.
Information Theory
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the common issues of deletion and insertion errors in DNA data storage technology. Specifically, it investigates how to construct non-binary burst-deletion/insertion correcting codes. These codes are particularly suitable for a quaternary alphabet and can correct burst deletion or insertion errors in DNA storage. ### Main Contributions 1. **Single Error Correcting Codes**: - A new non-binary Varshamov-Tenengolts (VT) code, called Differential VT Codes, is proposed to correct a single deletion or insertion error. - This new code is simpler and more efficient than the construction method proposed by Tenengolts [1984]. - A linear-time encoding algorithm is designed to encode user messages into a q-ary code of length n, with a redundancy of at most ⌈log_q(n)⌉ + 1 symbols, while the optimal redundancy is at least log_q(n) + log_q(q-1) symbols. - Compared to Tenengolts' best-known encoder, the new encoder reduces at least 2 redundant symbols, equivalent to 2 log_2(q) bits. 2. **Burst Error Correcting Codes**: - The idea of binary shifted VT codes is used to define q-ary Differential Shifted VT Codes. - A non-binary code capable of correcting up to 2 deletion (or insertion) errors is proposed, with a redundancy of log(n) + 3 log log(n) + O(log q) bits, which is better than the result by Wang et al. [2021], whose redundancy is log(n) + O(log q log log n) bits, applicable for all q ≥ 8. - The construction method is extended to design non-binary codes that can correct any t deletion (or insertion) errors. ### Background and Motivation - **Background**: Deletion and insertion errors are common in many data storage systems, such as bit-patterned media magnetic recording systems, racetrack memory devices, synchronization errors in communication systems, and mobile data. Particularly in DNA-based data storage technology, the probability of deletion and insertion errors is extremely high. - **Motivation**: Designing codes that can correct deletion and insertion errors is a challenging problem, especially in the most basic case (only one error). Deletion or insertion errors are more disruptive than substitution errors because a small number of errors can cause a significant difference in Hamming distance between the original data sequence and the received sequence. ### Methods and Techniques - **Differential Vector**: The paper proposes a method based on differential vectors to directly detect and correct errors in q-ary sequences, rather than converting them to binary sequences. - **Linear Time Algorithm**: Linear-time encoding and decoding algorithms are designed to make the encoding and decoding process more efficient. - **Redundancy Optimization**: By reducing redundancy, the encoding efficiency is improved, allowing the encoder to use fewer redundant symbols while maintaining error correction capability. ### Conclusion By proposing new Differential VT Codes and Differential Shifted VT Codes, the paper addresses the problem of single error and burst error correction in non-binary alphabets. These new codes not only have theoretical advantages but also demonstrate higher encoding efficiency and lower redundancy in practical applications.