Long-Tail Feature of DNA Words Over- and Under-Representation in Coding Sequences

A. Nowicka,M. R. Dudek,S. Cebrat,M. Kowalczuk,P. Mackiewicz,M. Dudkiewicz,D. Szczepanik,M.R. Dudek
DOI: https://doi.org/10.48550/arXiv.cond-mat/0102348
2001-02-20
Soft Condensed Matter
Abstract:We have analyzed DNA sequences of known genes from 16 yeast chromosomes (Saccharomyces cerevisiae) in terms of oligonucleotides. We have noticed that the relative abundances of oligonucleotide usage in the genome follow a long-tail Levy-like distribution. We have observed that long genes often use strongly over-represented and under-represented nucleotides, whereas it was not the case for the short genes (shorter than 300 nucleotides) under consideration. If selection on the extremely over-represented/under-represented oligonucleotides was strong, long genes would be more affected by spontaneous mutations than short ones.
What problem does this paper attempt to address?