Effect of Non-Target Examples on E.coli Promoters Recognition Using Neural Networks

PC Conilione,DH Wang
DOI: https://doi.org/10.1109/ijcnn.2005.1555848
2006-01-01
Abstract:Previous research into the recognition of E.coli promoters has focused on the use of raw DNA sequences and alignment methods to find interesting features in the promoter regions. In this paper, we aim to compare the classification accuracy of a neural network trained on DNA sequences encoded using orthogonal representation of the nucleotides, and a set of high level features from the DNA. In addition to this, we evaluate the impact of different types of non-promoters used in training and testing on the classification accuracy. 872 E.coli promoters were used and three types of non-promoters, which included random sequences with the same base frequency as the promoter sequences, genes sequences selected from E.coli and random sequences with the same base frequencies as the gene non-promoters. Raw DNA sequences were encoded using CODE-4 and high level features, which were outlined by previous researchers and subsequently formally defined in this paper. We found that the high level features did not perform as well for promoter recognition compared with CODE-4 DNA representation, contrary to expectation. The strongest determining factor in classification accuracy was the type of non-promoter used for training and testing. Overall non-promoters from coding regions and random sequences with the same base frequency as the gene non-promoter resulted in the best classification accuracy.
What problem does this paper attempt to address?