Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm

Li Tang,Zheng Zhao,Lei Zhang,Tao Zhang,Shan Gao
DOI: https://doi.org/10.1007/978-3-319-09330-7_1
2014-01-01
Abstract:The beta sheet, as one of the three common second form of regular secondary structure in proteins plays an important role in protein function. The best strands in a beta sheet can be classified into the outer or inner strands. Considering the protein primary sequences have determinant information to arrange the strands in the beta sheet topology, we introduce an approach by using the random forest algorithm to predict outer or inner arrangement of a beta strand. We use nine features to describe a strand based on the hydrophobicity, the hydrophilicity, the side-chain mass and other properties of the beta strands. The random forest classifiers reach the best prediction accuracy 89.45% with 10-fold cross-validation among five machine learning methods. This result demonstrates that there are significant differences between the outer beta strands and the inner ones in beta sheets. The finding in this study can be used to arrange beta strands in a beta sheet without any prior structure information. It can also help better understanding the mechanisms of protein beta sheet formation.
What problem does this paper attempt to address?