Automated identification of small molecules in cryo-electron microscopy data with density- and energy-guided evaluation

Andrew Muenks,Daniel P Farrell,Guangfeng Zhou,Frank DiMaio
DOI: https://doi.org/10.1101/2024.11.20.623795
2024-11-20
Abstract:Methodological improvements in cryo-electron microscopy (cryoEM) have made it a useful tool in ligand-bound structure determination for biology and drug design. However, determining the conformation and identity of bound ligands is still challenging at the resolutions typical for cryoEM. Automated methods can aid in ligand conformational modeling, but current ligand identification tools — developed for X-ray crystallography data — perform poorly at resolutions common for cryoEM. Here, we present EMERALD-ID, a method capable of docking and evaluating small molecule conformations for ligand identification. EMERALD-ID identifies 43% of common ligands exactly and identifies closely related ligands in 66% of cases. We then use this tool to discover possible ligand identification errors, as well as previously unidentified ligands. Furthermore, we show EMERALD-ID is capable of identifying ligands from custom ligand libraries of various small molecule types, including human metabolites and drug fragments. Our method provides a valuable addition to cryoEM modeling tools to improve small molecule model accuracy and quality.
Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically identifying small - molecule ligands in cryo - electron microscopy (cryoEM) data. Specifically: 1. **The ligand identification problem in cryo - electron microscopy data**: Although cryo - electron microscopy technology has made significant progress in biological structure determination and drug design, it is still challenging to determine the conformation and identity of bound ligands at typical resolutions. Existing ligand identification tools are mainly developed for X - ray crystallography data and perform poorly at the common resolutions of cryo - electron microscopy. 2. **Limitations of existing methods**: Current automated ligand identification methods rely on density map correlations or shape features of the map, and these methods have limited accuracy when the resolution is lower than 3 Å. In addition, although existing deep - learning methods can predict protein structures, they cannot determine the identity of ligands and do not consider the information of electron microscopy maps. 3. **The proposed new method**: To solve these problems, the authors proposed the EMERALD - ID method, which can dock and evaluate the conformations of small - molecule ligands, thereby improving the accuracy and quality of ligand identification. EMERALD - ID uses the RosettaGenFF small - molecule force field, the EMERALD ligand fitting method, and a linear regression model that combines estimated binding affinity and density correlation to distinguish ligand identities. Through these improvements, EMERALD - ID can accurately identify 43% of small - molecule ligands in common ligand libraries and identify ligands closely related to known ligands in 66% of cases. In addition, this method can also discover possible ligand identification errors and previously unrecognized ligands.