Abstract:e15013 Background: Next-generation sequencing (NGS) can produce up to 6 Tb of data per run with single-nucleotide accuracy, making it ideal for quantifying isomiRs, which encompass both canonical miRNAs and their variants, for clinical applications. However, NGS has poor reproducibility and low sample throughput in quantifying circulating isomiRs due to significant technical variations and the limitations of the multiplex strategy, as evidenced by the fact that no isomiR NGS technique has been successfully used to diagnose cancer. Methods: To address these challenges, a library construction method including a dual unique-dual-index (DUDI) technology was developed. DUDI uses a pair of Inner UDI (IUDI) and outer UDI (OUDI) to label a sample. Twelve independent batches of isomiR NGS were carried out, including three repeated batches. Each batch included 100 gastric cancer and 100 control plasma samples. Batch effect, correlation coefficient (R), and principal component (PCA) analyses were used to evaluate technical reproducibility. Machine learning binary classification was used to assess biological reproducibility, with each pair of batch data serving interchangeably as both training and testing data. Results: In this multicenter study, over 700G of isomiR data were generated from 402 gastric cancer and 498 control samples, with a maximum error rate of 1 in 7 million isomiRs being assigned to wrong samples. The PCA plot indicates high technical reproducibility across the three repeated batches, shown by the extensive intermingling of data points from each batch and the lack of distinct batch-wise clustering. This observation is reinforced by that the R value for each of 239 isomiRs between the repeated batches are close to 1. While the mutual machine learning validations between the repeated batches yielded ~95% accuracy, indicating high biological reproducibility. The accuracies of the validations between the different batches of different samples range from 70% to 82%. The lower accuracy is as expected, given the high genetic heterogeneity of cancer and the small sample size. Furthermore, the IsomiR differentiated expression profiles from the current NGS study closely match those from prior qPCR studies. Conclusions: The DUDI library construction method can produce reproducible high sample throughput NGS data, yet it is cost-effective and straightforward. The maximum number of samples that can be multiplexed in an NGS project is almost one million, i.e., 976 * 976, as IUDI and OUDI can be any of the 976 designed DUDIs. This number far exceeds the high sample throughput requirements of any NGS application. While the capability to distinguish true biological variations of IsomiRs from technical noise, demonstrated by the high technical and biological reproducibility and concordance with the qPCR data, enables the development of robust machine learning algorithms for cancer diagnostics.

QuickMIRSeq: a Pipeline for Quick and Accurate Quantification of Both Known Mirnas and Isomirs by Jointly Processing Multiple Samples from Microrna Sequencing.

Mtide: An Integrated Tool For The Identification Of Mirna-Target Interaction In Plants

IsoSeek for unbiased and UMI-informed sequencing of miRNAs from low input samples at single-nucleotide resolution

Quantitative Analysis of Zeptomole Micrornas Based on Isothermal Ramification Amplification

QuagmiR: a cloud-based application for isomiR big data analytics

mirPRo–a novel standalone program for differential expression and variation analysis of miRNAs

QuickRNASeq Lifts Large-Scale RNA-seq Data Analyses to the Next Level of Automation and Interactive Visualization

miRspring: a compact standalone research tool for analyzing miRNA-seq data

Mirmat: Mature Microrna Sequence Prediction

High-Throughput Quantification of miRNA-3'-Untranslated-Region Regulatory Effects

MSIQ: Joint modeling of multiple RNA-seq samples for accurate isoform quantification

Comprehensive Multi-Center Assessment of Small RNA-seq Methods for Quantitative Mirna Profiling

IsomiR Bank: a research resource for tracking IsomiRs

miRmine: A Database of Human miRNA Expression Profiles

miREvo: an integrative microRNA evolutionary analysis platform for next-generation sequencing experiments

Miras: a Data Processing System for Mirna Expression Profiling Study

OUHP: an optimized universal hairpin primer system for cost-effective and high-throughput RT-qPCR-based quantification of microRNA (miRNA) expression

Real-time Quantification of Micrornas by RNA-tailing and Primer-Extension RT-PCR

Reproducible and High Sample Throughput Isomir Next-Generation Sequencing for Cancer Diagnosis.

Mirdeep-P: A Computational Tool for Analyzing the Microrna Transcriptome in Plants

Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data