Abstract:The bacterial DNA sequence in GenBank database were divided into coding and noncoding regions and examined for the base-trimer distribution in every triplet frame on the sense and antisense strands. The results revealed that for the noncoding region, both strands have very similar base-trimer distributions and have no frame specificity; that is, DNA is symmetric in the noncoding region. For the coding region, on the other hand, the symmetry is broken only in the triplet framework, and we found a special triplet-frame-specific symmetry which appears when the two complementary strands of the coding region are read from their 5′ ends. In addition, the following frame specificity was also observed in the distribution of stop codons on the antisense strand of the coding region. When the antisense sequences of the open reading frames (ORFs) in the database are read in the three reading frames, the same reading frame as the corresponding ORF contains a significantly larger amount of long open frames without stop codons (i.e., nonstop frames [NSFs]) than expected, while the number of NSFs in the other two reading frames is similar to that of the expected one. That is, NSFs as well as ORFs are maintained in a frame-specific manner, and in this sense, DNA becomes symmetrical even in the coding region. These two kinds of frame-specific symmetries indicate that only an ORF and its complementary triplets are specifically recognized and maintained in DNA. We suppose that the antisense strands as well as the sense strands in the coding region may be transcribed, thereby producing various kinds of proteins corresponding to NSFs, though their amount may not be large. The presence of these proteins should have some benefits for living organisms, and therefore we propose that these proteins are upcoming enzymes having novel functions.

The combinatorics of overlapping genes

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Creating overlapping genes by alternate-frame insertion

Overlapping Genes in the Human and Mouse Genomes.

Mutational Constraint Analysis Workflow for Overlapping Short Open Reading Frames and Genomic Neighbours

A Frame-Specific Symmetry of Complementary Strands of DNA Suggests the Existence of Genes on the Antisense Strand

Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation

Combinatorics From Bacterial Genomes

How antisense transcripts can evolve to encode novel proteins

Creating De Novo Overlapped Genes

The Evolution and Expression Pattern of Human Overlapping Lncrna and Protein-coding Gene Pairs

The Shiftability of Protein Coding Genes: the Genetic Code Was Optimized for Frameshift Tolerating

On maximal almost balanced non-overlapping codes and non-overlapping codes with restricted run-lengths

Feature Identification of Compensatory Gene Pairs Without Sequence Homology in Yeast

Interconnected Codons: Unravelling the Epigenetic Significance of Flanking Sequences in CpG Dyads

Genetic and selective constraints on the optimization of gene product diversity

Overlapping Probabilities of Top Ranking Gene Lists, Hypergeometric Distribution, and Stringency of Gene Selection Criterion

The Hypercube Structure of the Genetic Code Explains Conservative and Non-Conservative Aminoacid Substitutions in vivo and in vitro

Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences.

Accurately Annotate Compound Effects of Genetic Variants Using a Context-Sensitive Framework.

Comparison Of Various Algorithms For Recognizing Short Coding Sequences Of Human Genes