Statistically Significant Strings are Related to Regulatory Elements in the Promoter Regions of Saccharomyces cerevisiae

Rui Hu,Bin Wang
DOI: https://doi.org/10.1016/S0378-4371%2800%2900488-X
2000-09-01
Abstract:Finding out statistically significant words in DNA and protein sequences forms the basis for many genetic studies. By applying the maximal entropy principle, we give one systematic way to study the nonrandom occurrence of words in DNA or protein sequences. Through comparison with experimental results, it was shown that patterns of regulatory binding sites in Saccharomyces cerevisiae(yeast) genomes tend to occur significantly in the promoter regions. We studied two correlated gene family of yeast. The method successfully extracts the binding sites varified by experiments in each family. Many putative regulatory sites in the upstream regions are proposed. The study also suggested that some regulatory sites are a ctive in both directions, while others show directional preference.
Biological Physics,Soft Condensed Matter,Data Analysis, Statistics and Probability,Quantitative Biology
What problem does this paper attempt to address?