Abstract:Recently developed deep learning techniques have significantly improved the accuracy of various speech and image recognition systems. In this paper we show how to adapt some of these techniques to create a novel chained convolutional architecture with next-step conditioning for improving performance on protein sequence prediction problems. We explore its value by demonstrating its ability to improve performance on eight-class secondary structure prediction. We first establish a state-of-the-art baseline by adapting recent advances in convolutional neural networks which were developed for vision tasks. This model achieves 70.0% per amino acid accuracy on the CB513 benchmark dataset without use of standard performance-boosting techniques such as ensembling or multitask learning. We then improve upon this state-of-the-art result using a novel chained prediction approach which frames the secondary structure prediction as a next-step prediction problem. This sequential model achieves 70.3% Q8 accuracy on CB513 with a single model; an ensemble of these models produces 71.4% Q8 accuracy on the same test set, improving upon the previous overall state of the art for the eight-class secondary structure problem. Our models are implemented using TensorFlow, an open-source machine learning software library available at <a class="link-external link-http" href="http://TensorFlow.org" rel="external noopener nofollow">this http URL</a>; we aim to release the code for these experiments as part of the TensorFlow repository.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy of protein secondary structure prediction. Specifically, by introducing the latest techniques in deep learning, the authors developed a novel chained convolutional neural network architecture and combined it with the next - step conditional prediction method to improve the performance of the protein sequence prediction problem. ### Problem Background The structure of a protein is crucial to its function. The secondary structure of a protein refers to the spatial arrangement of local amino acid residues in the protein, such as α - helices and β - sheets, etc. Accurately predicting the secondary structure of a protein can help scientists better understand the function of the protein, thereby accelerating drug development and other biomedical research. With the growth in the number of known protein sequences, the experimentally determined secondary structure data are far from keeping up. Therefore, computational methods are becoming increasingly important in protein structure prediction. Traditional machine - learning methods have made some progress in this field, but there is still much room for improvement. ### Core Problems of the Paper 1. **Improving the Prediction Accuracy of a Single Model**: First, by introducing techniques such as Batch Normalization, Dropout, weight - norm constraint, residual connections, and multi - scale convolutional filters, the authors constructed a new convolutional neural network architecture, which significantly increased the Q8 accuracy of a single model on the CB513 benchmark dataset to 70.0%. 2. **Introducing Next - Step Conditional Prediction**: To further improve the prediction performance, the authors redefined the protein secondary structure prediction problem as a chained prediction task, that is, predicting the current label conditionally depending on the previous labels. This method draws on the idea of language models in natural language processing. By introducing the true or predicted labels of the previous step as context during the training process, the model can better capture the dependencies in the sequence. Eventually, this method increased the Q8 accuracy on the CB513 dataset to 70.3%. 3. **Ensemble Model**: To verify the generalization ability of the model, the authors also trained and evaluated an ensemble of multiple independent models. The results show that the performance of the ensemble model is better than that of a single model, achieving a Q8 accuracy of 71.4%, exceeding the best previously reported results. ### Summary The main contributions of this paper are as follows: - Proposing a new convolutional neural network architecture, combined with multiple deep - learning techniques, which significantly improves the accuracy of protein secondary structure prediction. - Introducing the next - step conditional prediction method, which further enhances the prediction performance by modeling the dependencies between labels. - Proving that it is possible to surpass existing methods even without using enhancement techniques such as multi - task learning, demonstrating the powerful performance of the model. These improvements not only help to improve the accuracy of protein secondary structure prediction but also provide new ideas and technical references for future research.

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning

46,XX/46,XY chromosome complement in amniotic fluid cell culture followed by the birth of a normal female child

Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation

Protein secondary structure prediction using deep convolutional neural fields

PS8-Net: A Deep Convolutional Neural Network to Predict the Eight-State Protein Secondary Structure

DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction

Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction

High performance 1.3 /spl mu/m vertical-cavity surface-emitting lasers with oxygen-implanted confinement regions and wafer-bonded mirrors

A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks

Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction

Improved protein structure prediction using potentials from deep learning

Localnet: A Simple Recurrent Neural Network Model for Protein Secondary Structure Prediction Using Local Amino Acid Sequences Only

Predicting protein secondary structure with Neural Machine Translation

Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information

An Efficient Method for Protein Secondary Structure Prediction

ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure

Recent developments in deep learning applied to protein structure prediction

Protein Secondary Structure Prediction Using Neural Network and Simulated Annealing Algorithm

Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction