: intuitive analysis and visualisation of differential alternative splicing using beta distributions

Mariana Ascensão-Ferreira,Rita Martins-Silva,Nuno Saraiva-Agostinho,Nuno L. Barbosa-Morais
DOI: https://doi.org/10.1101/2022.12.26.521935
2024-01-15
Abstract:Next generation RNA sequencing allows alternative splicing (AS) quantification with unprecedented resolution, with the relative inclusion of an alternative sequence in transcripts being commonly quantified by the proportion of reads supporting it as percent spliced-in (PSI). However, PSI values do not incorporate information about precision, proportional to the respective AS events’ read coverage. Beta distributions are suitable to quantify inclusion levels of alternative sequences, using reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the PSI as its mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. We herein introduce a computational pipeline, based on beta distributions accurately modelling PSI values and their precision, to quantitatively and visually compare AS between groups of samples. Our methodology includes a differential splicing significance metric that compromises the magnitude of inter-group differences, the estimation uncertainty in individual samples, and the intra-group variability, being therefore suitable to multiple-group comparisons. To make our approach accessible and clear to both non-computational and computational biologists, we developed , an interactive web app and user-friendly R package for visual and intuitive differential splicing analysis from read count data.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more accurately evaluate differential alternative splicing events between different sample groups when using next - generation RNA sequencing technology for alternative splicing (AS) quantification. Specifically, the paper proposes a method based on the beta distribution to model and visualize differential alternative splicing under different conditions, in order to improve the accuracy of AS event estimation and provide an intuitive graphical interface tool that can be easily used by researchers without a computational biology background. ### Background and Problems 1. **Importance of Alternative Splicing**: Almost all protein - coding genes undergo alternative splicing, which greatly increases the diversity of the transcriptome and plays an important role in pre - translational regulation of gene expression. Different tissues and developmental stages finely regulate the transcriptome through highly regulated AS variations. 2. **Limitations of Existing Tools**: Most of the existing differential alternative splicing analysis tools have limited effectiveness when dealing with small sample sizes. They usually only focus on linear models or geometric distances and ignore the variation between and within samples. In addition, these tools are often difficult to be understood and used by researchers without a computational biology background. 3. **Limitations of PSI Values**: The Percent Spliced - In (PSI) is a commonly used index for quantifying AS events, but it does not contain information about precision, that is, the influence of read coverage. ### Methods Proposed in the Paper 1. **Beta Distribution Modeling**: The paper proposes to use the beta distribution to model the inclusion levels of AS events. The mean of the beta distribution is the PSI value, and when the read coverage is high, the distribution will be narrower, thus better reflecting the estimated precision. 2. **Significance Assessment of Differential Alternative Splicing**: - **Effect Size**: Calculate ΔPSI, that is, the difference in PSI values between two groups of samples. - **Significance Indicator I: Pdiff**: Calculate the proportion of random points in one condition that are greater than random points in another condition, reflecting the probability of differential AS. - **Significance Indicator II: FPR**: Estimate the False Positive Rate (FPR) by generating random numbers through simulation and test the null hypothesis that there is no difference in PSI values between the two groups. - **F - Statistic**: Calculate the ratio of the median absolute values of the between - group and within - group variations, providing a single indicator that combines effect size and significance. 3. **Multi - group Comparison**: Extend the above - mentioned methods to make them suitable for comparison between multiple sample groups and introduce monotonic trend analysis for time - series data. ### Tools and Applications - **betAS**: Developed a user - friendly R package and an interactive web application that can intuitively analyze and visualize differential alternative splicing from spliced - junction read - count data. - **User Interface**: Designed an easy - to - use interface that supports researchers without a programming background to upload data, define sample groups, run analyses and visualize results. ### Conclusions The methods and tools proposed in the paper aim to improve the accuracy and interpretability of differential alternative splicing analysis, especially in the case of small sample sizes. Through beta distribution modeling and multiple significance assessment indicators, betAS can more reliably identify biologically relevant AS changes and provide an intuitive graphical interface, enabling researchers to better understand the analysis results.