A novel prediction method for protein DNA-binding residues based on neighboring residue correlations

Jiazhi Song,Guixia Liu,Jingqing Jiang
DOI: https://doi.org/10.1080/13102818.2022.2122871
2022-01-01
Biotechnology & Biotechnological Equipment
Abstract:Accurately identifying the protein DNA-binding residues is important for understanding the protein-DNA recognition mechanism and protein function annotation. Many computational methods have been proposed to predict protein-DNA binding residues using protein sequence information; however, for severe imbalanced data like the protein-DNA binding dataset, the under-sampling technique which is applied by most previous methods cannot achieve satisfactory performance. In this study, an adjustment algorithm is proposed to offset the biased prediction results from the classifier. The proposed adjustment algorithm uses the binding probability between the target residue and its neighboring residues to identify more true binding residues which are wrongly predicted as non-binding. The proposed prediction method with adjustment algorithm achieves an area under the ROC curve (AUC) of 0.926 and 0.866 on two benchmark datasets and 0.882 on the independent testing set, which demonstrates that the proposed method can efficiently predict specific residues for protein-DNA interactions.
What problem does this paper attempt to address?