HSADab: A Comprehensive Database for Human Serum Albumin

Zhaoxi Sun,Lei Zheng,Zhaoyi Zeng, Yao Zhao,Xiao Liu,Zhe Huai, Xudong Zhang,John Z.H. Zhang
DOI: https://doi.org/10.26434/chemrxiv-2024-rh9r2
2024-05-20
Abstract:Human Serum Albumin (HSA), the most prevalent protein in human body fluids, is integral to the transportation, absorption, metabolism, distribution, and excretion of drugs. Its influence on a drug's therapeutic efficacy is substantial. Despite the importance of HSA as a drug target, the available data on its interactions with external agents (e.g., drug-like molecules and antibodies) are rather limited, which poses challenges for both molecular modelling investigations and the development of empirical scoring functions or machine learning predictors on this target. Moreover, the reported entries in existing databases often contain major inconsistencies due to varied experiments and conditions, which incurs worries about the data quality. To address these issues, we established a pioneering database through extensively reviewing more than 30000 scientific publications published between 1987 and 2023, encompassing over 5000 affinity data at multiple temperatures and more than 130 crystal structures that involve both the ligand-bound and apo forms. The current HSADab resource (www.hsadab.cn) serves as a reliable foundation for protocol validations of molecular simulations (e.g., traditional virtual screening workflow using docking, end-point and alchemical free energy techniques) as well as the data source for the implementation of machine learning predictors.
Chemistry
What problem does this paper attempt to address?
The paper attempts to address the issues of data scarcity and low data quality regarding human serum albumin (HSA) and its interactions with drugs. Specifically: 1. **Data Scarcity**: Although HSA plays a crucial role in drug transport, absorption, metabolism, distribution, and excretion, existing data on HSA interactions with external molecules (such as drug-like molecules and antibodies) are very limited. This poses challenges for molecular modeling studies and the development of empirical scoring functions or machine learning prediction models. 2. **Low Data Quality**: Records in existing databases often exhibit significant inconsistencies due to varying experimental conditions, which calls the quality of the data into question. To address these issues, researchers have established a new database called HSADab. By extensively reviewing over 30,000 publications from 1987 to 2023, they collected more than 5,000 affinity data points and 130 crystal structures. These data cover information on both the bound and unbound states of HSA with its ligands. This database provides a reliable foundation for validating molecular simulation protocols and can serve as a data source for developing machine learning prediction models.