Similarity Analysis Between Transcription Factor Binding Sites by Bayesian Hypothesis Test.

Qian Liu,San-Yang Liu,Li-Fang Liu
2011-01-01
Journal of information science and engineering
Abstract:Transcription factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. We propose to identify and group similar profiles using Bayesian hypothesis test between PFMs, describing a column-by-column method for PFM similarity quantification based on Bayes factor and posterior probability of null model that aligned columns are independent and identically distributed observation from the same multinomial distribution. We group TFBS frequency matrices from less redundant JASPAR into matrix families by cluster analysis according to Bayes factors and posterior probability of similar PFMs. Clusters of highly similar matrices are identified. We further compare the performance of this method to Pearson chi(2) test on simulated data. The proposed method is very simple, easily implemented and outperforms the other method in our test. Taking Pearson product moment correlation coefficient as an objective criterion of the performance, results indicate that Bayesian test performs better than the classical methods on average.
What problem does this paper attempt to address?