A practical DNA data storage using expanded alphabet introducing 5-methylcytosine

Deruilin Liu,Demin Xu,Liuxin Shi,Jiayuan Zhang,Kewei Bi,Bei Luo,Chen Liu,Yuxiang Li,Guangyi Fan,Wen Wang,Zhi Ping
DOI: https://doi.org/10.1101/2024.12.26.630439
2024-12-26
Abstract:DNA molecular is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, the feasibility of the strategy is challenging due to the difficulty in synthesizing and the complex structure of non-natural DNA sequences. Here, we described a practical DNA data storage transcoding scheme named R+ based on expanded molecular alphabet by introducing 5-methlcytosine(5mC). We also demonstrated the experimental validation by encoding one representative file into several 1.3~1.6 kbps in vitro DNA fragments for nanopore sequencing. The results show an average data recovery rate of 98.97% and 86.91% with and without reference respectively. This work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.
Synthetic Biology
What problem does this paper attempt to address?