Improving Cis-Regulatory Elements Modeling by Consensus Scaffolded Mixture Models

HongShan Jiang,Ying Zhao,WenGuang Chen,WeiMin Zheng,XueGong Zhang
DOI: https://doi.org/10.1007/s11432-011-4374-9
2011-01-01
Science China Information Sciences
Abstract:A position weight matrix(PWM) is widely accepted as a probabilistic representation for modeling protein-DNA binding specificity.Previous studies showed that for factors which bind to divergent binding sites,mixtures of multiple PWMs improve performance.We propose a consensus scaffolded mixutre PWM(CSM) model to improve cis-regulatory elements modeling by allowing overlapping components represented by a set of PWMs,each of which corresponds to a binding pattern and is scaffolded by a degenerate consensus.In addition,we propose a learning algorithm that involves an initial structure learning stage based on the frequent pattern mining and a refining stage based on the expectation maximization(EM) algorithm.We assess the merits of CSM using three independent criteria.In a case-study of transcription factor Leu3,the derived CSM models agree with conventional mixtures but show better fitness according to Fermi-Dirac distribution.Analysis of the human-mouse conservation of predicted binding sites of 83 JASPAR transcription factors(TFs) shows that the CSM is as good as or better than the simple mixture,the context-specific independent(CSI) mixture,and the single PWM model,for 83%,84%,and 75% of the cases,respectively.Five-fold cross validation on 46 TRANSFAC datasets shows that CSM model has better generality than other mixture models.
What problem does this paper attempt to address?