Brownian motion data augmentation: a method to push neural network performance on nanopore sensors

Javier Kipen,Joakim Jaldén
DOI: https://doi.org/10.1101/2024.09.10.612270
2024-09-14
Abstract:Nanopores are highly sensitive sensors that have achieved commercial success in DNA/RNA sequencing, with potential applications in protein sequencing and biomarker identification. Solid-state nanopores, in particular, face challenges such as instability and low signal-to-noise ratios (SNRs), which lead scientists to adopt data-driven methods for nanopore signal analysis, although data acquisition remains restrictive. In this paper, we augment training samples by simulating virtual Brownian motion based on dynamic models in the literature. We apply this method to a publicly available dataset of a classification task containing nanopore reads of DNA with encoded barcodes. A neural network named QuipuNet was previously published for this dataset, and we demonstrate that our augmentation method produces a noticeable increase in QuipuNets accuracy. Furthermore, we introduce a novel neural network named YupanaNet, which achieves greater accuracy (95.8%) than QuipuNet (94.6%) on the same dataset. YupanaNet benefits from both the enhanced generalization provided by Brownian motion data augmentation and the incorporation of novel architectures, including skip connections and a self-attention mechanism.
Bioinformatics
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Data Augmentation Method**: A Brownian Motion Data Augmentation method is proposed to enhance the performance of neural networks on nanopore sensors. This method generates new training samples by simulating virtual Brownian motion, thereby improving the model's generalization ability. 2. **Improvement of Existing Models**: A new neural network architecture, YupanaNet, is introduced, which adds residual connections and self-attention mechanisms based on QuipuNet. Experimental results show that YupanaNet achieves higher accuracy (95.8%) on the same task, outperforming QuipuNet (94.6%). 3. **Signal Processing Challenges**: The paper specifically addresses the instability and low signal-to-noise ratio issues present in solid-state nanopore sensors, which limit the effectiveness of data-driven methods. By introducing the Brownian Motion Data Augmentation technique, researchers are able to significantly enhance the performance of existing models. In summary, the paper aims to overcome the challenges in nanopore sensor data analysis through innovative data augmentation techniques and improved neural network architectures, thereby improving the accuracy of classification tasks.