Abstract:CC-BY-NC-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was. Abstract Functional relationship networks, which reveal the collaborative roles between genes, have significantly accelerated our understanding of gene functions and phenotypic relevance. However, establishing such networks for alternatively spliced isoforms remains a difficult, unaddressed problem due to the lack of systematic functional annotations at the isoform level, which renders most supervised learning methods difficult to be applied to isoforms. Here we describe a novel multiple instance learning-based probabilistic approach that integrates large-scale, heterogeneous genomic datasets, including RNA-seq, exon array, protein docking and pseudo-amino acid composition, for modeling a global functional relationship network at the isoform level in the mouse. Using this approach, we formulate a gene pair as a set of isoform pairs of potentially different properties. Through simulation and cross-validation studies, we showed the superior accuracy of our algorithm in revealing the isoform-level functional relationships. The local networks reveal functional diversity of the isoforms of the same gene, as demonstrated by both large-scale analyses and experimental and literature evidence for the disparate functions revealed for the isoforms of Ptbp1 and Anxa6 by our network. Our work can assist the understanding of the diversity of functions achieved by alternative splicing of a limited set of genes in mammalian genomes, and may shift the current gene-centered network prediction paradigm to the isoform level. Author summary Proteins carry out their functions through interacting with each other. Such interactions can be achieved through direct physical interactions, genetic interactions, or co-regulation. To summarize these interactions, researches have established functional relationship networks, in which each gene is represented as a node and the connections between the nodes represent how likely two genes work in the same biological process. Currently, these networks are established at the gene level only, while each gene, in mammalian systems, can be alternatively spliced into multiple isoforms that may have drastically different interaction partners. This information can be mined through integrating data that provide isoform-level information, such as RNA-seq and protein docking scores predicted from amino acid sequences. In this study, we developed a novel algorithm to integrate such data for predicting isoform-level functional relationship networks, which allows us to investigate the collaborative roles between genes at a high resolution.

Revisiting the Identification of Canonical Splice Isoforms Through Integration of Functional Genomics and Proteomics Evidence

Functional Networks of Highest-Connected Splice Isoforms: from the Chromosome 17 Human Proteome Project

A Proteogenomic Approach to Understand Splice Isoform Functions Through Sequence and Expression-Based Computational Modeling.

A Network of Splice Isoforms for the Mouse

Modeling the functional relationship network at the isoform level through heterogeneous data integration

Modeling the functional relationship network at the splice isoform level through heterogeneous data integration

The Emerging Era of Genomic Data Integration for Analyzing Splice Isoform Function.

Integrating many co-splicing networks to reconstruct splicing regulatory modules

Discovery of Novel Genes and Gene Isoforms by Integrating Transcriptomic and Proteomic Profiling from Mouse Liver.

Identifiability of isoform deconvolution from junction arrays and RNA-Seq.

Systematic Reconstruction of Splicing Regulatory Modules by Integrating Many RNA-seq Datasets

Revealing Missing Isoforms Encoded in the Human Genome by Integrating Genomic, Transcriptomic and Proteomic Data

IsoResolve: predicting splice isoform functions by integrating gene and isoform-level features with domain adaptation

Systematically Differentiating Functions for Alternatively Spliced Isoforms Through Integrating RNA-seq Data.

A systematic analysis of the effects of splicing on the diversity of post-translational modifications in protein isoforms

Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing

Co-expression Networks Uncover Regulation of Splicing and Transcription Markers of Disease.

In silico and in cellulo approaches for functional annotation of human protein splice variants

Uncovering the translatome impact of transcriptome induced diversity in eukaryotes: framework and innovative insights

Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning.