Highly Accurate Protein Structure Classification and Prediction

Indranil Sarkar,Anirban Saha
DOI: https://doi.org/10.1109/ICCSC56913.2023.10142975
2023-03-03
Abstract:Proteins are the main building blocks for any form of life known to us as of now, and it is the actuators of biophysical and chemical events occurring in living organisms. Biological functions are enabled by their naive structure, which plays a very important and crucial role in the design of vaccines and drugs. This acts as one of the main sources of motivation in predicting protein structure from its sequence of amino acids coupled with other information to get highly accurate prediction and classification, which indeed is one of the fundamental computational biology problems. As of now, not much focus has been given to the inclusion of sidechain structure information and prediction of the protein backbone. In this paper, it is shown that a new dataset called SidechainNet, which extends from the ProteinNet dataset, can be used to predict and classify the structure of proteins more accurately. This is because SidechainNet consists of angle and atomic coordinate information, which describes almost all the heavy atoms of each and every protein structure. The background information on the availability of data on the protein structure and the importance of ProteinNet is discussed. It is followed by the beneficial inclusion of additional information that SidechainNet has, which helps in predicting the structure of the protein more accurately. At last, it is shown how using a Machine Learning model, a highly accurate protein structure is obtained by applying SidechainNet as its dataset.
Biology,Computer Science
What problem does this paper attempt to address?