3D2SMILES: Translating Physical Molecular Models into Digital DeepSMILES Notations Using Deep Learning

Wenqi Guo,Yiyang Du,Mohamed Shehata
DOI: https://doi.org/10.26434/chemrxiv-2024-zvcb4
2024-08-05
Abstract:Physical molecular models are widely used in educational settings for teaching organic and other branches of chemistry, offering an intuitive way of understanding molecular structures. Conversely, virtual models, while less intuitive, provide additional functionalities such as the ability to retrieve molecular names and other properties. Currently, to the best of our knowledge, there is a gap between 3D molecular models and their digital counterparts. This paper introduces a computer vision model designed to bridge this gap by converting images of physical molecular models into their digital DeepSMILES representations. This conversion facilitates further information retrieval, enhancing educational utility. We developed both synthetic and real datasets to train our model and evaluated its performance across various dataset combinations, model architectures, and dataset sizes. Additionally, we attempted to improve the model's accuracy by multi-image input and beam search. We achieved 62.0\% top-1 accuracy and 80.3\% top-3 accuracy with beam search and multi-image input on our validation set. We also explored the model's characteristics, such as explainability by saliency maps, error analysis, and examined its calibration. We also discussed the model's limitations and directions for future research.
Chemistry
What problem does this paper attempt to address?
The main goal of this paper is to develop a computer vision model that can convert physical molecular models into their digital representations—specifically, into DeepSMILES notation. This conversion helps to extract more molecular information from physical models, enhancing their application value in the field of education. To achieve this goal, the researchers constructed two datasets: one is a computer-generated 3D molecular model dataset, and the other is a real-world 3D molecular model dataset. They also trained a model based on these datasets, which can convert images of physical molecular models into the corresponding Simplified Molecular Input Line Entry System (SMILES) notation. Additionally, the research team attempted to improve the accuracy of the output through multi-image input and beam search methods. Through experiments, the model achieved a top-1 accuracy of 62.0% and a top-3 accuracy of 80.3% on the validation set. The study also explored some characteristics of the model, such as interpretability, error analysis, and model calibration, and discussed the limitations of the model and directions for future research. Overall, this work aims to bridge the gap between physical molecular models and digital representations, providing more intuitive and functionally rich tools for chemical education.