Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences

Thomas D. Schneider,Vishnu Jejjala
DOI: https://doi.org/10.1371/journal.pone.0222419
2019-10-30
Abstract:Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.
Quantitative Methods,Information Theory
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **Why can restriction endonucleases recognize specific DNA sequences so precisely, and why do most restriction endonucleases recognize sequences with a length of 4 or 6 base pairs?** Specifically, the author uses information theory and sphere - packing theory in high - dimensional space to explain these phenomena. ### Main problem decomposition 1. **Precise recognition mechanism**: - How do restriction endonucleases still maintain high selectivity for specific DNA sequences under the interference of thermal noise? - For example, EcoRI can specifically bind to the GAATTC sequence, and even a single - base change will significantly reduce its binding ability. 2. **Choice of sequence length**: - Why do most restriction endonucleases recognize sequences with a length of 4 or 6 base pairs? - Is this length related to some optimization mechanism? ### Solutions By introducing information theory and sphere - packing theory in high - dimensional space, the author proposes the following solutions: 1. **High - dimensional space model**: - Model the binding process of restriction endonucleases as selecting different states in a high - dimensional "coding space". - Each state can be represented by a sphere in high - dimensional space, and the distance between spheres represents the distinguishability between different sequences. 2. **Sphere - packing theory**: - In high - dimensional space, the sphere - packing density can be used to explain the precise recognition ability of restriction endonucleases. - For example, EcoRI works in 24 - dimensional space, which enables it to utilize the Leech lattice, the densest known sphere - packing, thus minimizing the error rate. 3. **Energy dissipation and signal - to - noise ratio**: - Analyze the relationship between energy dissipation (P) and thermal noise (N) to determine the operating efficiency of restriction endonucleases. - When P is close to N, the system has the highest efficiency and can also ensure a clear distinction between different states. ### Conclusions - **Restriction endonucleases with 4 base pairs**: Usually work in 16 - dimensional space, which also corresponds to a better sphere - packing. - **Restriction endonucleases with 6 base pairs**: Such as EcoRI, work in 24 - dimensional space, utilizing the optimal Leech lattice packing. - These findings indicate that restriction endonucleases have evolved to be able to operate efficiently in high - dimensional space to achieve precise DNA sequence recognition and minimize the error rate. In this way, the paper not only explains the precise recognition mechanism of restriction endonucleases but also reveals how they optimize the dimension of their operating space during the evolution process.