Distinguish Coding And Noncoding Sequences In A Complete Genome Using Fourier Transform

Yu Zhou,Li-qian Zhou,Zuguo Yu,V. Anh
DOI: https://doi.org/10.1109/ICNC.2007.333
2007-08-24
Abstract:A Fourier transform method is proposed to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation of the DNA sequence proposed in our previous paper (Zhou et ah, J. Theor. Biol. 2005) and the imperfect periodicity of 3 in protein coding sequences. The three parameters P<sub>x(s</sub> <sub>macr)</sub>(1), P<sub>x(s</sub> <sub>macr)</sub>(1/3) and P<sub>x(s</sub> <sub>macr)</sub>(1/36) in the Fourier transform of the number sequence representation of DNA sequences are selected to form a three-dimensional parameter space. Each DNA sequence is then represented by a point in this space. The points corresponding to coding and non-coding sequences in the complete genome of prokaryotes are seen to be divided into different regions. If the point (P<sub>x(s</sub> <sub>macr)</sub>(1), P<sub>x(s</sub> <sub>macr)</sub>(1/3), P<sub>x(s</sub> <sub>macr)</sub> (1/36)) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is distinguished as a coding sequence; otherwise, the sequence is classified as a noncoding one. Fisher's discriminant algorithm is used to study the discriminant accuracy. The average discriminant accuracies p<sub>c</sub>, p<sub>nc</sub>, q<sub>c</sub> and q<sub>nc</sub> of all 51 prokaryotes obtained by the present method reach 81.02%, 92.27%, 80.77% and 92.24% respectively.
Mathematics
What problem does this paper attempt to address?