Regulus infers signed regulatory relations from few samples' information using discretization and likelihood constraints
Marine Louarn,Guillaume Collet,Ève Barré,Thierry Fest,Olivier Dameron,Anne Siegel,Fabrice Chatonnet
DOI: https://doi.org/10.1371/journal.pcbi.1011816
2024-01-23
PLoS Computational Biology
Abstract:Transcriptional regulation is performed by transcription factors (TF) binding to DNA in context-dependent regulatory regions and determines the activation or inhibition of gene expression. Current methods of transcriptional regulatory circuits inference, based on one or all of TF, regions and genes activity measurements require a large number of samples for ranking the candidate TF-gene regulation relations and rarely predict whether they are activations or inhibitions. We hypothesize that transcriptional regulatory circuits can be inferred from fewer samples by (1) fully integrating information on TF binding, gene expression and regulatory regions accessibility, (2) reducing data complexity and (3) using biology-based likelihood constraints to determine the global consistency between a candidate TF-gene relation and patterns of genes expressions and region activations, as well as qualify regulations as activations or inhibitions. We introduce Regulus , a method which computes TF-gene relations from gene expressions, regulatory region activities and TF binding sites data, together with the genomic locations of all entities. After aggregating gene expressions and region activities into patterns, data are integrated into a RDF (Resource Description Framework) endpoint. A dedicated SPARQL (SPARQL Protocol and RDF Query Language) query retrieves all potential relations between expressed TF and genes involving active regulatory regions. These TF-region-gene relations are then filtered using biological likelihood constraints allowing to qualify them as activation or inhibition. Regulus provides signed relations consistent with public databases and, when applied to biological data, identifies both known and potential new regulators. Regulus is devoted to context-specific transcriptional circuits inference in human settings where samples are scarce and cell populations are closely related, using discretization into patterns and likelihood reasoning to decipher the most robust regulatory relations. Gene expression regulation is based on the activity of specialized regulatory proteins called transcription factors (TFs) which can bind DNA at specific sequences. Understanding the regulatory relations between TFs and genes in humans is fundamental in personalized clinical settings, to better decipher the pathological mechanisms and to identify new therapeutic solutions. However, finding the main regulators of such systems is usually difficult, due to the scarcity of available samples and the biological closeness of the studied cell types. To overcome these issues, we introduce a new tool called Regulus . We use information from genes and TFs expression, regulatory regions activity and TF binding sites occurrences to compute TF-gene relations. We then apply a likelihood reasoning step, based on the biological knowledge of transcriptional regulation mechanisms, to select the most probable relations and assign them a function as activation or inhibition. Finally, we reduce the potential TFs list by a specificity / coverage filter and we annotate it according to existing literature. By testing Regulus onlarge-scale biological datasets, each describing four biological contexts, we show that this tool is able to i) identify both known and undescribed regulators consistent with all the gene expression and region accessibility constraints in each biological context, ii) include low expressed genes in its relations and iii) considerably limit the space of putative TF-gene relations.
biochemical research methods,mathematical & computational biology