Direct transposition of native DNA for sensitive multimodal single-molecule sequencing

Arjun S. Nanda,Ke Wu,Iryna Irkliyenko,Brian Woo,Megan S. Ostrowski,Andrew S. Clugston,Leanne C. Sayles,Lingru Xu,Ansuman T. Satpathy,Hao G. Nguyen,E. Alejandro Sweet-Cordero,Hani Goodarzi,Sivakanthan Kasinathan,Vijay Ramani
DOI: https://doi.org/10.1038/s41588-024-01748-0
IF: 30.8
2024-05-10
Nature Genetics
Abstract:Concurrent readout of sequence and base modifications from long unamplified DNA templates by Pacific Biosciences of California (PacBio) single-molecule sequencing requires large amounts of input material. Here we adapt Tn5 transposition to introduce hairpin oligonucleotides and fragment (tagment) limiting quantities of DNA for generating PacBio-compatible circular molecules. We developed two methods that implement tagmentation and use 90–99% less input than current protocols: (1) single-molecule real-time sequencing by tagmentation (SMRT-Tag), which allows detection of genetic variation and CpG methylation; and (2) single-molecule adenine-methylated oligonucleosome sequencing assay by tagmentation (SAMOSA-Tag), which uses exogenous adenine methylation to add a third channel for probing chromatin accessibility. SMRT-Tag of 40 ng or more human DNA (approximately 7,000 cell equivalents) yielded data comparable to gold standard whole-genome and bisulfite sequencing. SAMOSA-Tag of 30,000–50,000 nuclei resolved single-fiber chromatin structure, CTCF binding and DNA methylation in patient-derived prostate cancer xenografts and uncovered metastasis-associated global epigenome disorganization. Tagmentation thus promises to enable sensitive, scalable and multimodal single-molecule genomics for diverse basic and clinical applications.
genetics & heredity
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **How to achieve sensitive multi - mode single - molecule sequencing by directly transposing local DNA, thereby reducing the amount of input material required and enhancing the ability to simultaneously detect genomic and epigenetic information**. Specifically, the authors developed two Tn5 - transposase - based methods to significantly reduce the DNA input required for PacBio single - molecule real - time sequencing (SMS) while maintaining high sensitivity and versatility. ### Analysis of the main problems: 1. **The need to reduce the amount of DNA input**: - Current PacBio SMS technology requires a large amount of DNA input (usually at least 1 - 5 µg), which limits its application in rare clinical samples, single cells, and microorganisms. - By introducing Tn5 - transposase, the authors inserted hairpin oligonucleotides into DNA fragments to generate circular molecules suitable for PacBio sequencing, thereby significantly reducing the required amount of DNA (a 90 - 99% reduction). 2. **Enhancing the versatility of sequencing**: - Traditional SMS methods can only read sequence information and cannot simultaneously detect base modifications (such as methylation). - The authors developed two new methods: - **SMRT - Tag**: Used to detect genetic variation and CpG methylation. - **SAMOSA - Tag**: Adds a third channel through exogenous adenine methylation to probe chromatin accessibility. 3. **Enhancing the sensitivity and resolution of single - molecule sequencing**: - The authors demonstrated that these two methods can obtain data quality comparable to that of standard whole - genome and bisulfite sequencing from a small amount of DNA (such as 40 ng of human DNA, equivalent to approximately 7,000 cells). - SAMOSA - Tag can also analyze single - fiber chromatin structure, CTCF binding, and DNA methylation in a patient - derived prostate cancer xenograft model, revealing metastasis - related global epigenomic disorders. ### Formula summary: The formulas involved in this paper are mainly in the data analysis and statistical test parts, for example: - **F1 score**: \[ F1 = 2\times\frac{\text{Precision}\times\text{Recall}}{\text{Precision}+\text{Recall}} \] - **Pearson correlation coefficient**: \[ r=\frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2}\sum{(y_i - \bar{y})^2}}} \] These improvements in methods and techniques provide new tools for future genomics research, especially when dealing with limited samples and complex disease models.