Representation Of Dna Sequences With Multiple Resolutions And Bp Neural Network Based Classification

X Huang,Ds Huang,Hq Wang,Xm Zhao
DOI: https://doi.org/10.1109/IJCNN.2004.1380108
2004-01-01
Abstract:In this paper(1) we propose a new representation of DNA sequences, which constructs the word frequency vector with multiple resolutions based on the chaos game representation. Compared with the traditional vector, it combines a range of resolutions and reserves higher resolutions but the dimension is reduced greatly relatively. The algorithm is detailed, which calculates coding format and codes each sequence. To evaluate the significance of our method, we represent Alu sequences by our proposed coding format. After that the acquired vectors are used to train BP neural networks to recognize the Alu sequences. The experimental results show that this representation of DNA sequences is significant and efficient in biological data processing.
What problem does this paper attempt to address?