An Integrated Approach of Sequence and Text Mining Technology for the Identification of Transcription Factor Binding Sites

Yun Xiong,Qing Yang,Boren Qiu,Yangyong Zhu
DOI: https://doi.org/10.1109/bibmw.2008.4686233
2008-01-01
Abstract:The study of the complex mechanisms that regulated gene expression on the level of transcription is an important and challenging issue in post-genomic era. A crucial step is to identify transcription factor binding sites(TFBSs). However, the number of the known TFBSs is limited, and the accuracy of the state-of-the-art identification methods is still far from satisfactory. In this paper, a novel integrated method for mining transcription factor binding sites is presented, which combines the sequence data mining method with the text mining method. Therefore, the method can not only obtain the putative TFBSs from the sequence data sets, but also acquire the experimentally verified TFBSs from the literatures. To evaluate the performance of our method, several experiments have been tested on real data sets. The results show that our integrated method outperforms each of the algorithms alone, furthermore, exhibits superior accuracy than existing algorithms.
What problem does this paper attempt to address?