BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias

Hyojin Son,Sechan Lee,Jaeuk Kim,Haangik Park,Myeong-Ha Hwang,Gwan-Su Yi
DOI: https://doi.org/10.1186/s12859-024-05968-3
IF: 3.307
2024-11-02
BMC Bioinformatics
Abstract:Deep learning-based drug-target affinity (DTA) prediction methods have shown impressive performance, despite a high number of training parameters relative to the available data. Previous studies have highlighted the presence of dataset bias by suggesting that models trained solely on protein or ligand structures may perform similarly to those trained on complex structures. However, these studies did not propose solutions and focused solely on analyzing complex structure-based models. Even when ligands are excluded, protein-only models trained on complex structures still incorporate some ligand information at the binding sites. Therefore, it is unclear whether binding affinity can be accurately predicted using only compound or protein features due to potential dataset bias.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?