Quantitative estimates of the regulatory influence of long non-coding RNAs on global gene expression variation using TCGA breast cancer transcriptomic data
Xiaoman Xie,Saurabh Sinha
DOI: https://doi.org/10.1371/journal.pcbi.1012103
2024-06-06
PLoS Computational Biology
Abstract:Long non-coding RNAs (lncRNAs) have received attention in recent years for their regulatory roles in diverse biological contexts including cancer, yet large gaps remain in our understanding of their mechanisms and global maps of their targets. In this work, we investigated a basic unanswered question of lncRNA systems biology: to what extent can gene expression variation across individuals be attributed to lncRNA-driven regulation? To answer this, we analyzed RNA-seq data from a cohort of breast cancer patients, explaining each gene's expression variation using a small set of automatically selected lncRNA regulators. A key aspect of this analysis is that it accounts for confounding effects of transcription factors (TFs) as common regulators of a lncRNA-mRNA pair, to enrich the explained gene expression for lncRNA-mediated regulation. We found that for 16% of analyzed genes, lncRNAs can explain more than 20% of expression variation. We observed 25–50% of the putative regulator lncRNAs to be in 'cis' to, i.e., overlapping or located proximally to the target gene. This led us to quantify the global regulatory impact of such cis-located lncRNAs, which was found to be substantially greater than that of trans-located lncRNAs. Additionally, by including statistical interaction terms involving lncRNA-protein pairs as predictors in our regression models, we identified cases where a lncRNA's regulatory effect depends on the presence of a TF or RNA-binding protein. Finally, we created a high-confidence lncRNA-gene regulatory network whose edges are supported by co-expression as well as a plausible mechanism such as cis-action, protein scaffolding or competing endogenous RNAs. Our work is a first attempt to quantify the extent of gene expression control exerted globally by lncRNAs, especially those located proximally to their regulatory targets, in a specific biological (breast cancer) context. It also marks a first step towards systematic reconstruction of lncRNA regulatory networks, going beyond the current paradigm of co-expression networks, going beyond the current paradigm of co-expression networks, and motivates future analyses assessing the generalizability of our findings to additional biological contexts. Many studies have reported the role of long non-coding RNAs in regulating gene expression, yet there has not been a systematic assessment of the extent to which gene expression variance can be explained by lncRNA regulation. lncRNA-mRNA co-expression networks are commonly constructed using pair-wise correlations, but these approaches do not claim to quantify lncRNA-driven regulation and may be confounded by shared transcriptional regulators of lncRNA-mRNA pairs. Here, we used a multivariable linear regression model to predict mRNA expression using lncRNAs' expression in a large breast cancer transcriptomic dataset from TCGA, and obtained estimates for the overall gene expression variance under lncRNA regulation. We also integrated orthogonal criteria such as genome proximity and protein binding information to refine these estimates and identified lncRNA-mRNA associations that are both statistically significant and biologically supported.
biochemical research methods,mathematical & computational biology