Abstract:Background: Solving the structure of mRNA transcripts is a major challenge for both research and molecular diagnostic purposes. Current approaches based on short-read RNA sequencing and RT-PCR techniques cannot fully explore the complexity of transcript structure. The emergence of third-generation long-read sequencing addresses this problem by solving this sequence directly. However, genes with low expression levels are difficult to study with the whole transcriptome sequencing approach. To fix this technical limitation, we propose a novel method to capture transcripts of a gene panel using a targeted enrichment approach suitable for Pacific Biosciences and Oxford Nanopore Technologies platforms. Results: We designed a set of probes to capture transcripts of a panel of genes involved in hereditary breast and ovarian cancer syndrome. We present SOSTAR (iSofOrmS annoTAtoR), a versatile pipeline to assemble, quantify and annotate isoforms from long read sequencing using a new tool specially designed for this application. The significant enrichment of transcripts by our capture protocol, together with the SOSTAR annotation, allowed the identification of 1,231 unique transcripts within the gene panel from the eight patients sequenced. The structure of these transcripts was annotated with a resolution of one base relative to a reference transcript. All major alternative splicing events of the BRCA1 and BRCA2 genes described in the literature were found. Complex splicing events such as pseudoexons were correctly annotated. SOSTAR enabled the identification of abnormal transcripts in the positive controls. In addition, a case of unexplained inheritance in a family with a history of breast and ovarian cancer was solved by identifying an SVA retrotransposon in intron 13 of the BRCA1 gene. Conclusions: We have validated a new protocol for the enrichment of transcripts of interest using probes adapted to the ONT and PacBio platforms. This protocol allows a complete description of the alternative structures of transcripts, the estimation of their expression and the identification of aberrant transcripts in a single experiment. This proof-of-concept opens new possibilities for RNA structure exploration in both research and molecular diagnostics.

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

SQANTI-reads: a tool for the quality assessment of long read data in multi-sample lrRNA-seq experiments

SQANTI-reads: a tool for the quality assessment of long read data in multi-sample lrRNA-seq experiments.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Illuminating the dark side of the human transcriptome with long read transcript sequencing

Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

TEQUILA-seq: a versatile and low-cost method for targeted long-read RNA sequencing

IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles

Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner

Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches

Fine mapping of RNA isoform diversity using an innovative targeted long-read RNA sequencing protocol with novel dedicated bioinformatics pipeline

BIISQ: Bayesian nonparametric discovery of Isoforms and Individual Specific Quantification

LIQA: long-read isoform quantification and analysis

Accurate isoform quantification by joint short- and long-read RNA-sequencing

snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation

Real-time transcriptomic profiling in distinct experimental conditions

Long-read sequencing transcriptome quantification with lr-kallisto

RNA-SeQC: RNA-seq metrics for quality control and process optimization

Quality assessment and control of tissue specific RNA-seq libraries of Drosophila transgenic RNAi models