A Rewritable, Random-Access DNA-Based Storage System

S. M. Hossein Tabatabaei Yazdi,Yongbo Yuan,Jian Ma,Huimin Zhao,Olgica Milenkovic
DOI: https://doi.org/10.48550/arXiv.1505.02199
2015-05-09
Abstract:We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
Information Theory
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key limitations in existing DNA storage systems: **inability to achieve partial and random data access** and **limitations of read - only storage**. Specifically: 1. **Problems with random access and partial reading**: - Existing DNA storage methods (such as those described in references [3] and [4]) need to decode the entire file in order to read one data fragment within it, which makes partial and random data access very inefficient. - This requirement for full - file decoding is a significant obstacle in practical applications, because it is usually necessary to access specific data segments rather than the entire file. 2. **Problems with read - only storage**: - Current designs only support read - only storage and cannot edit or update the stored information. This limits the use of DNA storage in applications that require frequent information updates, such as storing frequently changing data or recording edit histories. To solve these problems, the author has developed a new DNA storage architecture with the following features: - **Random access ability**: By introducing specially designed address sequences, specific data blocks can be precisely selected and accessed without decoding the entire file. - **Rewritability**: The stored information can be modified and updated through DNA editing techniques (such as gBlock and OE - PCR methods), thus achieving data rewriting. ### Main innovation points 1. **Address sequence design**: - Uncorrelated address sequences are introduced. These address sequences have the characteristics of constant GC content, large Hamming distance, no self - correlation, and avoidance of secondary structures. These properties ensure the uniqueness and reliability of the address sequences. 2. **Encoding method**: - Prefix - synchronized coding is used to avoid the appearance of address sequences or their substrings during the encoding process, thereby ensuring data accuracy and integrity. 3. **Experimental verification**: - The feasibility and accuracy of the new system are verified through experiments. The author encoded parts of Wikipedia pages of six American universities into DNA and successfully selected the texts of three schools for editing and rewriting, verifying the random access and rewriting functions of the system. ### Summary This paper proposes a brand - new DNA storage architecture, which solves the problems of random access and rewriting in existing methods and provides an important theoretical and technical basis for future DNA storage technologies.