Identifying Cell Type Specific TF Combinatorial Regulation Via a Two-Stage Statistical Method

Kairong Liu,Andrew Hutchins,Yong Wang
DOI: https://doi.org/10.1109/bigcomp48618.2020.00-50
2020-01-01
Abstract:Transcription factors (TFs) are sequence-specific DNA-binding proteins controlling the genetic information's transcription rate from DNA to messenger RNA. Many TFs work together as a complex to perform their function and change cell morphology or activities for cell fate determination and cellular differentiation. In this paper, we propose a Two-Stage Statistical Method (TSSM) to study the interactions among TFs in a tissue specific way. We find that TF genes tend to be specifically expressed across cell types and have significantly different expression patterns compared to non-TF genes. This motivates us to infer the TF interactions by two stages. First stage we check two TFs' global correlation across all cell types by counting the number of overlapped cell types and assessing fold change and hyper-geometric distribution test p-value. Second stage the local correlation is assessed by the Pearson Correlation Coefficient across those highly expressed cell types. TSSM combines these two stages via Fisher's method and identifies the TF pairs interacting in those highly expressed cell types. This allows us to probe the dynamics of TFs' combinatorial regulation in multiple tissues or cell types. We compile a large collection of RNA-seq data across 231 cell types in mouse. The predicted 3,876 TF interactions are significantly overlap with the experimental TF combinations and the tissue specific regulatory networks in human. In addition, TSSM outperforms the existing correlation methods using experimental data as gold standard. Taken together, TSSM serves as a useful tool to probe the TFs' combinatorial regulation mechanism across multiple tissue or cell types.
What problem does this paper attempt to address?