Abstract:In this paper, we prove that with high probability, random Reed-Solomon codes approach the half-Singleton bound - the optimal rate versus error tradeoff for linear insdel codes - with linear-sized alphabets. More precisely, we prove that, for any $\epsilon>0$ and positive integers $n$ and $k$, with high probability, random Reed--Solomon codes of length $n$ and dimension $k$ can correct $(1-\varepsilon)n-2k+1$ adversarial insdel errors over alphabets of size $n+2^{\mathsf{poly}(1/\varepsilon)}k$. This significantly improves upon the alphabet size demonstrated in the work of Con, Shpilka, and Tamo (IEEE TIT, 2023), who showed the existence of Reed--Solomon codes with exponential alphabet size $\widetilde O\left(\binom{n}{2k-1}^2\right)$ precisely achieving the half-Singleton bound.
Our methods are inspired by recent works on list-decoding Reed-Solomon codes. Brakensiek-Gopi-Makam (STOC 2023) showed that random Reed-Solomon codes are list-decodable up to capacity with exponential-sized alphabets, and Guo-Zhang (FOCS 2023) and Alrabiah-Guruswami-Li (STOC 2024) improved the alphabet-size to linear. We achieve a similar alphabet-size reduction by similarly establishing strong bounds on the probability that certain random rectangular matrices are full rank. To accomplish this in our insdel context, our proof combines the random matrix techniques from list-decoding with structural properties of Longest Common Subsequences.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is whether random Reed - Solomon codes can approach the half - Singleton bound under the insertion - deletion (insdel) error model, that is, whether these codes can correct a near - optimal number of insdel errors with high probability in the case of a linear - size alphabet.
Specifically, the goal of the paper is to prove that for any \(\varepsilon>0\) and positive integers \(n\) and \(k\), a random Reed - Solomon code (with length \(n\) and dimension \(k\)) can correct \((1 - \varepsilon)n-2k + 1\) adversarial insdel errors with high probability when the alphabet size is \(n+2\text{poly}(1/\varepsilon)k\). This significantly improves the work of Con, Shpilka, and Tamo (IEEE TIT, 2023), which shows that there exist Reed - Solomon codes that exactly reach the half - Singleton bound on an exponentially - sized alphabet.
### Background and Motivation
- **Error - Correcting Codes**: Error - correcting codes (ECC) aim to recover the original information from corrupted data. Common error models include substitution errors and erasure errors, while synchronization errors (such as insertion and deletion errors) are more complex.
- **Reed - Solomon Codes**: Reed - Solomon codes are a widely - used type of linear code, usually used to correct substitution errors. However, their performance in handling insdel errors is not clear.
- **Linear Codes vs. Non - linear Codes**: The performance of linear codes in handling insdel errors is usually not as good as that of non - linear codes, but linear codes have the advantages of compact representation, efficient encoding and decoding algorithms, etc. Therefore, it is still very important to study the performance of linear codes under the insdel model.
### Main Contributions of the Paper
- **Main Result**: The paper proves that random Reed - Solomon codes can approach the half - Singleton bound with high probability in the case of a linear - size alphabet and correct \((1 - \varepsilon)n-2k + 1\) insdel errors.
- **Method**: The paper draws on recent work on list - decoding Reed - Solomon codes and proves that certain random rectangular matrices are full - rank by establishing strong probability bounds. In addition, the paper combines the structural properties of the longest common subsequence to adapt to the insdel error model.
### Technical Details
- **V - matrix**: The paper defines a V - matrix \(V_{k,\ell,I,J}\). If this matrix is not full - rank, it means that there are two different codewords whose longest common subsequence exceeds a certain length, resulting in the inability to correct insdel errors.
- **Probability Amplification**: By introducing the "relaxation" parameter \(\varepsilon\), the paper proves that the probability that the V - matrix is not full - rank can be significantly reduced, so that the entire code can correct more insdel errors with high probability.
### Future Research Directions
1. **Lower Bound of Field Size**: Further study the minimum field size required for linear codes (not just Reed - Solomon codes) to approach the half - Singleton bound.
2. **Explicit Construction**: Provide an explicit construction of Reed - Solomon codes so that they can efficiently correct insdel errors.
3. **Decoding Algorithm**: Design an efficient decoding algorithm so that Reed - Solomon codes can handle insdel errors.
4. **Affine Codes**: Study the performance of affine codes under the insdel error model, especially whether they can approach or reach the Singleton bound.
Through these studies, the paper not only promotes theoretical understanding but also provides an important foundation for practical applications.