Statistical analysis of promoter sequences based on position weight matrix

Liang XUN,Li ZHANG,Yisong ZHEN,Rutai HUI
DOI: https://doi.org/10.3321/j.issn:1000-0054.2006.07.022
2006-01-01
Abstract:Recent progress has shown that the distribution of transcription factor binding sites in the promoter region affects eukaryotic gene expression. The position weight matrix (PWM) algorithm was used to analyze data on the distribution of four liver-enriched transcription factor binding sites with a novel method to score the sequences. The results show that the distribution of these liver-enriched transcription factor binding sites differs significantly from that of non-liver genes, with the differences possibly attributed to the mechanism of liver-specific expression and regulation. Liver-specific genes can be well identified by statistical features extracted from the binding possibility distribution with this approach with an accuracy of 93.3%. The methodology can be used for further research on the regulating mechanism of eukaryotic genes.
What problem does this paper attempt to address?