Abstract:As nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore's capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs. We demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 seconds/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257x with the trade off of a higher byte error rate of 3.52% compared to the state-of-the-art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4x larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error free reads as compared to DNA.

Nanopore Decoding with Speed and Versatility for Data Storage

Composite Hedges Nanopores: A High INDEL-Correcting Codec System for Rapid and Portable DNA Data Readout

An End-to-End Coding Scheme for DNA-Based Data Storage With Nanopore-Sequenced Reads

Nanopore Decoding of Oligonucleotides in DNA Computing

Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling

Models and Information-Theoretic Bounds for Nanopore Sequencing

Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing

On Coding for an Abstracted Nanopore Channel for DNA Storage

The Future of Decoding Non-Standard Nucleotides: Leveraging Nanopore Sequencing for Expanded Genetic Codes

On Nanopore DNA Sequencing by Signal and Noise Analysis of Ionic Current.

Concatenated Nanopore DNA Codes

Correcting a Single Deletion in Reads from a Nanopore Sequencer

Deciphering DNA nucleotide sequences and their rotation dynamics with interpretable machine learning integrated C3N nanopores

DNA StairLoop: Achieving High Error-correcting and Parallel-processing Capabilities in DNA-based Data Storage

Nanopore-Based DNA Sequencing Sensors and CMOS Readout Approaches

High-throughput optical sensing of nucleic acids in a nanopore array

Nanopore Deciphering Single Digital Polymers Towards High-Density Data Storage.

Error-Correcting Codes for Nanopore Sequencing

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

Design Principles of DNA-Barcodes for Nanopore-FET Readout, Based on Molecular Dynamics and TCAD Simulations

WaveNano: a Signal-Level Nanopore Base-Caller Via Simultaneous Prediction of Nucleotide Labels and Move Labels Through Bi-Directional WaveNets