Abstract:Abstract Background Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC–MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC–MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC–MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides , and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.

Discovery of Novel Genes and Gene Isoforms by Integrating Transcriptomic and Proteomic Profiling from Mouse Liver.

Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

Efficient Detection of the Alternative Spliced Human Proteome Using Translatome Sequencing

Integrating short-read and long-read single-cell RNA sequencing for comprehensive transcriptome profiling in mouse retina

Global Survey of Mouse Liver Protein Expression Using Liquid Isoelectric Focusing Prefractionation of Tryptic Peptides and LC-MS/MS

Full-length RNA transcript sequencing traces brain isoform diversity in house mouse natural populations

Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing

Improving silkworm genome annotation using a proteogenomics approach.

Identification and analysis of mouse non-coding RNA using transcriptome data

Integrated Analysis of a Compendium of RNA-Seq Datasets for Splicing Factors.

Proteogenomics Integrating Novel Junction Peptide Identification Strategy Discovers Three Novel Protein Isoforms of Human NHSL1 and EEF1B2.

Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq

Improved definition of the mouse transcriptome via targeted RNA sequencing

A Proteogenomics Approach Integrating Proteomics and Ribosome Profiling Increases the Efficiency of Protein Identification and Enables the Discovery of Alternative Translation Start Sites.

Analysis of human liver proteome using replicate shotgun strategy.

A Hidden Human Proteome Encoded by 'Non-Coding' Genes

Optimized Exon-Exon Junction Library and its Application on Rodents' Brain Transcriptome Analysis

Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches

isoTarget: A Genetic Method for Analyzing the Functional Diversity of Splicing Isoforms In Vivo

A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data