Abstract:The idea of this project is to study the protein structure and sequence relationship using the hidden markov model and artificial neural network. In this context we have assumed two hidden markov models. In first model we have taken protein secondary structures as hidden and protein sequences as observed. In second model we have taken protein sequences as hidden and protein structures as observed. The efficiencies for both the hidden markov models have been calculated. The results show that the efficiencies of first model is greater that the second one .These efficiencies are cross validated using artificial neural network. This signifies the importance of protein secondary structures as the main hidden controlling factors due to which we observe a particular amino acid sequence. This also signifies that protein secondary structure is more conserved in comparison to amino acid sequence.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper titled "A New Method for Protein Structure Prediction" aims to explore the relationship between protein sequences and their three - dimensional structures, especially by using statistical methods such as Hidden Markov Models (HMM) and Artificial Neural Networks (ANN) to study this relationship. Specifically, the paper attempts to answer the following questions:
1. **The relationship between protein secondary structure and amino acid sequence**:
- Can it be assumed that the protein's secondary structure is the main hidden controlling factor that determines its amino acid sequence?
- Or, can it be assumed that the amino acid sequence is the main hidden controlling factor that determines the protein's secondary structure?
2. **Comparison of model efficiencies**:
- Compare the efficiencies of two hypothesized models, namely:
- First model: Take the protein secondary structure as the hidden state and the amino acid sequence as the observed state.
- Second model: Take the amino acid sequence as the hidden state and the protein secondary structure as the observed state.
3. **Validation and cross - validation**:
- Use artificial neural networks to cross - validate the efficiencies of these two models to determine which hypothesis is more reasonable.
### Research background and importance
The three - dimensional structure of a protein is crucial to its function. However, directly predicting the three - dimensional structure of a protein from its amino acid sequence is a complex and computationally expensive problem. Therefore, finding more effective methods to predict protein structures has important theoretical and practical significance. By studying the relationship between protein secondary structure and amino acid sequence, new perspectives can be provided for understanding protein folding mechanisms and may lead to breakthroughs in fields such as protein design and drug development.
### Main conclusions
According to the paper's abstract, the first model (taking the protein secondary structure as the hidden state and the amino acid sequence as the observed state) shows higher efficiency. This indicates that the protein secondary structure may be the main hidden controlling factor that determines the amino acid sequence, further suggesting that the protein secondary structure is more conserved than the amino acid sequence during the evolutionary process.
### Formula representation
When discussing Hidden Markov Models, some key formulas are mentioned in the paper. For example, the forward probability and the backward probability are defined as follows:
- Forward probability ($\alpha_t(j)$):
\[
\alpha_t(j)=P(o_1, o_2,\ldots, o_t, q_t = j|\lambda)
\]
where $o_1, o_2,\ldots, o_t$ are the observation sequences, $q_t = j$ means being in state $j$ at time $t$, and $\lambda$ is the model parameter.
- Backward probability ($\beta_t(i)$):
\[
\beta_t(i)=P(o_{t + 1}, o_{t+2},\ldots, o_T|q_t = i,\lambda)
\]
where $o_{t+1}, o_{t + 2},\ldots, o_T$ are the observation sequences from time $t + 1$ to $T$, $q_t = i$ means being in state $i$ at time $t$, and $\lambda$ is the model parameter.
These formulas are used to calculate various probabilities in Hidden Markov Models, thereby evaluating the performance of the model and making predictions.
Hope this information can help you better understand the research purpose and content of this paper. If you have more questions or need further explanations, please feel free to ask!