Nano Trees: Nanopore signal processing and sublevel fitting using Decision Trees

Kyle Briggs,Vincent Tabard-Cossa,Paula Branco,James Harden,Philipp Mensing,Deekshant Wadhwa
DOI: https://doi.org/10.26434/chemrxiv-2024-7mnqb
2024-09-23
Abstract:As the complexity of solid-state nanopore experiments increases, analysis of the resulting electrical signals to determine biomolecular details becomes a challenge. State of the art techniques for this task perform poorly when transient signal characteristics approach the bandwidth limitations of the measurement electronics. In this work, we address this challenge through an algorithm, called Nano Trees, for fitting piecewise constant functions. Nano Trees leverages machine learning algorithms to provide accurate fits to the noisy piecewise constant data that is characteristic of nanopore ionic current signals, producing accurate fits on transients as short as twice the rise time of the measurement system. We demonstrate the performance of our algorithm on several real and synthetic datasets. These findings underscore the generalizability and accuracy of this approach in the regime of fast molecular translocations.
Chemistry
What problem does this paper attempt to address?
The paper attempts to address the challenge of analyzing biomolecular electrical signals in solid-state nanopore experiments due to the bandwidth limitations of measurement electronic devices. Specifically, existing technical methods perform poorly when the transient signal characteristics approach the bandwidth limits of the measurement electronic devices. The paper proposes a new algorithm called Nano Trees for fitting piecewise constant functions. This algorithm leverages machine learning techniques to accurately fit piecewise constant data in noisy environments and can produce accurate fitting results on transient signals as short as twice the system rise time. Additionally, the paper demonstrates the performance of the Nano Trees algorithm on multiple real and synthetic datasets, proving its versatility and accuracy in fast molecular translocation processes. The method aims to standardize the statistical analysis of complex nanopore signals across different experimental contexts by improving the fitting and characterization of nanopore signals. This helps classify nanopore events in mixed samples. In summary, the Nano Trees algorithm aims to overcome the limitations of existing technologies in handling fast transient signals, thereby enhancing the accuracy and reliability of nanopore data analysis.