Abstract:BACKGROUND:Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes.RESULTS:Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes.CONCLUSIONS:By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.

Cluster Analysis of Replicated Alternative Polyadenylation Data Using Canonical Correlation Analysis.

PAcluster: Clustering Polyadenylation Site Data Using Canonical Correlation Analysis

A Two-Layer Model For Gene Clustering Using Poly(A) Site Data

Accurate transcriptome-wide identification and quantification of alternative polyadenylation from RNA-seq data with APAIQ

QuantifyPoly(A): Reshaping Alternative Polyadenylation Landscapes of Eukaryotes with Weighted Density Peak Clustering.

DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

scAPAmod: Profiling Alternative Polyadenylation Modalities in Single Cells from Single-Cell RNA-Seq Data

APASdb: a database describing alternative poly(A) sites and selection of heterogeneous cleavage sites downstream of poly(A) signals.

Integrative Analysis of Gene Expression and Alternative Polyadenylation from Single-Cell RNA-seq Data.

Alternative Polyadenylation: Methods, Findings, and Impacts

Scapadb: a Comprehensive Database of Alternative Polyadenylation at Single-Cell Resolution.

scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data.

APAtrap: Identification and Quantification of Alternative Polyadenylation Sites from RNA-seq Data.

A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq

Movapa: Modeling and Visualization of Dynamics of Alternative Polyadenylation Across Biological Samples.

by PAS-Seq Complex and dynamic landscape of RNA polyadenylation revealed Material Supplemental

Alternative Polyadenylation: Methods, Mechanism, Function, and Role in Cancer.

Analysis of alternative polyadenylation from single-cell RNA-seq using scDaPars reveals cell subpopulations invisible to gene expression

scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data

VAAPA: a Web Platform for Visualization and Analysis of Alternative Polyadenylation.

SNP2APA: a Database for Evaluating Effects of Genetic Variants on Alternative Polyadenylation in Human Cancers