Abstract:The basic genetic law of life is built based on DNA double helix model and the central dogma of molecular biology (i.e. DNA-RNA bidirectional transcription and RNA-protein translation), in which DNA and most of RNA carry coding information, and proteins make up the structure of the body and carry out most of biological functions. This describes the flow of genetic information within and between individual and protein as the functional molecules in life, so DNA that codes for protein (known as exon) is functional in view of these rules. In this review, we provide an overview of the origin of “junk” DNA and further discuss how “junk” DNA functions in depth. With a glimpse on landscape of human genome, only very small fractions are protein coding DNA in our book of life. By contrast, the large fractions are non-coding DNA, which cannot be translated into proteins and have been assumed that such DNA do not contain any information nor have function. Also it has been found that the genome size of organism does not correlate well with the complexity of organism, suggesting large amounts of non-coding DNA exist in lower organism. Such non-coding DNA in organism genome are regarded as uselessness and commonly referred to as “junk” DNA. However, with the advent of next generation sequencing technologies and ability to improvement of analyzing data, these provide the possibility to systematically understand so called “junk” DNA. First, genome-wide association studies have successfully identified many single nucleotide polymorphisms (SNPs) underlying susceptibility to diseases; however, the majority of SNPs locate in non-coding region of genome. Moreover, the parts of non-coding DNA are highly conserved between human and mice. All of these suggest non-coding DNA are functional in some way. Second, it has been revealed that about 75 percent of our genome is actually transcribed. Such transcripts that do not code any protein are termed as non-coding RNAs (ncRNAs). These ncRNAs, such as canonical transfer and ribosomal RNAs, as well as the recently identified microRNAs (miRNAs), long non-coding RNAs (lncRNAs) circular RNAs (circRNAs) etc, have been shown to play the important physiological function in organism. Also the deregulation of these ncRNAs has been found to have relevance not only to tumorigenesis, but also to neurological, cardiovascular, developmental and other diseases. Here we further discuss the rapidly advancing fields of miRNA, lncRNA and circRNA in detail. We summarize their production, gene structure and organization in the genome and diverse functions. Although miRNA has been well studied in last decade, we are still in early step of understanding the nature and extent of the involvement of other ncRNAs in physiology and disease. This will shed light on great advances in therapeutic strategies and diagnostic approaches based on the understanding on the molecular mechanisms of ncRNAs.

The Preferential Mode Analysis of DNA Sequence.

Method of persistent changing probability and information extraction in nucleotide sequences

Long-range correlations in DNA sequences using 2D DNA walk based on pairs of sequential nucleotides

Statistical Properties of Nucleotides in Human Chromosomes 21 and 22

Gene Prediction by the Noise-Assisted MEMD and Wavelet Transform for Identifying the Protein Coding Regions

The Mystery of &ldquo;junk&rdquo; DNA

Numerical Representation Of Dna Sequences Based On Genetic Code Context And Its Applications In Periodicity Analysis Of Genomes

Synonymous and non-synonymous transitions/transversions vividly disclose purifying selection in coding sequences

On the Law of Directionality of Genome Evolution

Usage Patterns of Codons Versus Complementary Codons among Cellular Organisms and Organelles

Modal Codon Usage: Assessing the Typical Codon Usage of a Genome

Adolescents’ Perceived Weight Associated With Depression in Young Adulthood: A Longitudinal Study

The Hypercube Structure of the Genetic Code Explains Conservative and Non-Conservative Aminoacid Substitutions in vivo and in vitro

Analysis of DNA Sequence Pattern Using Probabilistic Neural Network Model

Recent codon preference reversals in the lineage

A p-adic model of DNA sequence and genetic code

A binary representation of the genetic code

Long-Tail Feature of DNA Words Over- and Under-Representation in Coding Sequences

Application of a Stent-Graft After Initial Occlusion with Interlocking Detachable Coils for Treatment of Penetrating Atherosclerotic Ulcer of the Aorta

Detecting Positively Selected Sites from Amino Acid Sequences: an Implicit Codon Model

SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage