Evidence for Non-Random Hydrophobicity Structures in Protein Chains

Anders Irbäck,Carsten Peterson,Frank Potthast
DOI: https://doi.org/10.1073/pnas.93.18.9533
1996-10-15
Abstract:The question of whether proteins originate from random sequences of amino acids is addressed. A statistical analysis is performed in terms of blocked and random walk values formed by binary hydrophobic assignments of the amino acids along the protein chains. Theoretical expectations of these variables from random distributions of hydrophobicities are compared with those obtained from functional proteins. The results, which are based upon proteins in the SWISS-PROT data base, convincingly show that the amino acid sequences in proteins differ from what is expected from random sequences in a statistical significant way. By performing Fourier transforms on the random walks one obtains additional evidence for non-randomness of the distributions. We have also analyzed results from a synthetic model containing only two amino-acid types, hydrophobic and hydrophilic. With reasonable criteria on good folding properties in terms of thermodynamical and kinetic behavior, sequences that fold well are isolated. Performing the same statistical analysis on the sequences that fold well indicates similar deviations from randomness as for the functional proteins. The deviations from randomness can be interpreted as originating from anticorrelations in terms of an Ising spin model for the hydrophobicities. Our results, which differ from previous investigations using other methods, might have impact on how permissive with respect to sequence specificity the protein folding process is -- only sequences with non-random hydrophobicity distributions fold well. Other distributions give rise to energy landscapes with poor folding properties and hence did not survive the evolution.
Chemical Physics,Condensed Matter,High Energy Physics - Lattice
What problem does this paper attempt to address?
The problem that this paper attempts to solve is whether proteins originate from random sequences of amino acids. Specifically, the authors studied the non - randomness of hydrophobic structures in protein chains through statistical analysis methods. They used protein data in the SWISS - PROT database, constructed a random walk model of protein chains through binary hydrophobicity assignment (that is, amino acids are marked as hydrophobic or hydrophilic), and compared it with randomly distributed hydrophobicity. In addition, the authors also carried out a similar analysis on a simplified synthetic model (AB model), which contains only two types of amino acids - hydrophobic and hydrophilic. ### Main problems 1. **Do proteins originate from random amino acid sequences?** - Through statistical analysis, the authors explored the non - randomness of hydrophobicity distribution in protein chains. They found that the amino acid sequences of functional proteins are statistically significantly different from random sequences, indicating that proteins do not originate from completely random amino acid sequences. 2. **How does the non - randomness of hydrophobicity distribution affect protein folding?** - The authors further studied the influence of the non - randomness of hydrophobicity distribution on protein folding. Through the analysis of the AB model, they found that only amino acid sequences with specific hydrophobicity distributions can fold well, while other distributions lead to an energy landscape unfavorable for folding, and these sequences may be eliminated during the evolution process. ### Methods and results - **Statistical analysis methods**: - **Blocking Method**: By dividing the protein chain into different blocks, the behavior of block variables was studied. The results showed that for different proportions of hydrophobic residues, the fluctuations of block variables showed different patterns, especially there were differences between the interior and the ends of the sequence. - **Fourier Transform Method**: By performing a Fourier transform on the random walk representation, the periodicity of the distribution of hydrophobic residues was detected. The results showed that there was non - random behavior at the wavelength corresponding to the α - helix structure (about 3.6 residues), as well as at larger wavelengths. - **AB model analysis**: - The authors analyzed 300 randomly selected chains in the AB model, among which only about 10% of the chains had good folding properties. These well - folded sequences showed non - randomness similar to that of functional proteins in terms of block variable fluctuations and Fourier components. ### Conclusions - The analysis results of the authors indicate that the amino acid sequences of functional proteins have significant non - randomness in hydrophobicity distribution, which is different from some previous research results. These non - randomness can be interpreted as anti - correlation, similar to the antiferromagnetic interaction in the one - dimensional Ising model. - These results are of great significance for understanding the sequence specificity of the protein folding process, indicating that only amino acid sequences with specific hydrophobicity distributions can fold well and thus be retained during the evolution process. ### Significance - This study not only deepens the understanding of the mechanism of protein structure formation, but also provides a theoretical basis for protein design and engineering, and is helpful for the development of new drugs and biotechnological applications.